Information Geometry: Near Randomness and Near Independence

269
Lecture Notes in Mathematics 1953 Editors: J.-M. Morel, Cachan F. Takens, Groningen B. Teissier, Paris

Transcript of Information Geometry: Near Randomness and Near Independence

Page 1: Information Geometry: Near Randomness and Near Independence

Lecture Notes in Mathematics 1953

Editors:J.-M. Morel, CachanF. Takens, GroningenB. Teissier, Paris

Page 2: Information Geometry: Near Randomness and Near Independence
Page 3: Information Geometry: Near Randomness and Near Independence

Khadiga A. Arwini · Christopher T.J. Dodson

Information Geometry

Near Randomnessand Near Independence

ABC

Page 4: Information Geometry: Near Randomness and Near Independence

Authors

Khadiga A. ArwiniAl-Fateh UniversityFaculty of SciencesMathematics DepartmentBox 13496Tripoli, [email protected]

Christopher T.J. DodsonUniversity of ManchesterSchool of MathematicsManchester M13 9PL, United [email protected]

ISBN: 978-3-540-69391-8 e-ISBN: 978-3-540-69393-2DOI: 10.1007/978-3-540-69393-2

Lecture Notes in Mathematics ISSN print edition: 0075-8434ISSN electronic edition: 1617-9692

Library of Congress Control Number: 2008930087

Mathematics Subject Classification (2000): 53B50, 60D05, 62B10, 62P35, 74E35, 92D20

c© 2008 Springer-Verlag Berlin HeidelbergThis work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable to prosecution under the German Copyright Law.

The use of general descriptive names, registered names, trademarks, etc. in this publication does notimply, even in the absence of a specific statement, that such names are exempt from the relevant protectivelaws and regulations and therefore free for general use.

Cover design: SPi Publishing Services

Printed on acid-free paper

9 8 7 6 5 4 3 2 1

springer.com

Page 5: Information Geometry: Near Randomness and Near Independence

Preface

The main motivation for this book lies in the breadth of applications in whicha statistical model is used to represent small departures from, for example, aPoisson process. Our approach uses information geometry to provide a com-mon context but we need only rather elementary material from differentialgeometry, information theory and mathematical statistics. Introductory sec-tions serve together to help those interested from the applications side inmaking use of our methods and results. We have available Mathematica note-books to perform many of the computations for those who wish to pursuetheir own calculations or developments.

Some 44 years ago, the second author first encountered, at about the sametime, differential geometry via relativity from Weyl’s book [209] during un-dergraduate studies and information theory from Tribus [200, 201] via spatialstatistical processes while working on research projects at Wiggins Teape Re-search and Development Ltd—cf. the Foreword in [196] and [170, 47, 58]. Hav-ing started work there as a student laboratory assistant in 1959, this researchenvironment engendered a recognition of the importance of international col-laboration, and a lifelong research interest in randomness and near-Poissonstatistical geometric processes, persisting at various rates through a careermainly involved with global differential geometry. From correspondence inthe 1960s with Gabriel Kron [4, 124, 125] on his Diakoptics, and with KazuoKondo who influenced the post-war Japanese schools of differential geometryand supervised Shun-ichi Amari’s doctorate [6], it was clear that both had amuch wider remit than traditionally pursued elsewhere. Indeed, on moving toLancaster University in 1969, receipt of the latest RAAG Memoirs Volume 41968 [121] provided one of Amari’s early articles on information geometry [7],which subsequently led to his greatly influential 1985 Lecture Note volume [8]and our 1987 Geometrization of Statistical Theory Workshop at LancasterUniversity [10, 59].

Reported in this monograph is a body of results, and computer-algebraicmethods that seem to have quite general applicability to statistical modelsadmitting representation through parametric families of probability density

V

Page 6: Information Geometry: Near Randomness and Near Independence

VI Preface

functions. Some illustrations are given from a variety of contexts for geomet-ric characterization of statistical states near to the three important standardbasic reference states: (Poisson) randomness, uniformity, independence. Theindividual applications are somewhat heuristic models from various fields andwe incline more to terminology and notation from the applications rather thanfrom formal statistics. However, a common thread is a geometrical represen-tation for statistical perturbations of the basic standard states, and henceresults gain qualitative stability. Moreover, the geometry is controlled by ametric structure that owes its heritage through maximum likelihood to infor-mation theory so the quantitative features—lengths of curves, geodesics, scalarcurvatures etc.—have some respectable authority. We see in the applicationssimple models for galactic void distributions and galaxy clustering, aminoacid clustering along protein chains, cryptographic protection, stochastic fi-bre networks, coupled geometric features in hydrology and quantum chaoticbehaviour. An ambition since the publication by Richard Dawkins of The Self-ish Gene [51] has been to provide a suitable differential geometric frameworkfor dynamics of natural evolutionary processes, but it remains elusive. On theother hand, in application to the statistics of amino acid spacing sequencesalong protein chains, we describe in Chapter 7 a stable statistical qualitativeproperty that may have evolutionary significance. Namely, to widely varyingextents, all twenty amino acids exhibit greater clustering than expected fromPoisson processes. Chapter 11 considers eigenvalue spacings of infinite randommatrices and near-Poisson quantum chaotic processes.

The second author has benefited from collaboration (cf. [34]) with thegroup headed by Andrew Doig of the Manchester Interdisciplinary Biocentre,the University of Manchester, and has had long-standing collaborations withgroups headed by Bill Sampson of the School of Materials, the University ofManchester (cf.eg. [73]) and Jacob Scharcanski of the Instituto de Informatica,Universidade Federal do Rio Grande do Sul, Porto Alegre, Brasil (cf.eg. [76])on stochastic modelling. We are pleased therefore to have co-authored withthese colleagues three chapters: titled respectively, Amino Acid Clustering,Stochastic Fibre Networks, Stochastic Porous Media and Hydrology.

The original draft of the present monograph was prepared as notes forshort Workshops given by the second author at Centro de Investigaciones deMatematica (CIMAT), Guanajuato, Mexico in May 2004 and also in the De-partamento de Xeometra e Topoloxa, Facultade de Matematicas, Universidadede Santiago de Compostela, Spain in February 2005.

The authors have benefited at different times from discussions with manypeople but we mention in particular Shun-ichi Amari, Peter Jupp, PatrickLaycock, Hiroshi Matsuzoe, T. Subba Rao and anonymous referees. However,any overstatements in this monograph will indicate that good advice mayhave been missed or ignored, but actual errors are due to the authors alone.

Khadiga Arwini, Department of Mathematics, Al-Fateh University, LibyaKit Dodson, School of Mathematics, the University of Manchester, England

Page 7: Information Geometry: Near Randomness and Near Independence

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V

1 Mathematical Statistics and Information Theory . . . . . . . . . . . 11.1 Probability Functions for Discrete Variables . . . . . . . . . . . . . . . . . 2

1.1.1 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Probability Density Functions for Continuous Variables . . . . . . 61.2.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.3 Gaussian, or Normal Distribution . . . . . . . . . . . . . . . . . . . 9

1.3 Joint Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Bivariate Gaussian Distributions . . . . . . . . . . . . . . . . . . . . 10

1.4 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.1 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Introduction to Riemannian Geometry . . . . . . . . . . . . . . . . . . . . 192.0.2 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.0.3 Tangent Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.0.4 Tensors and Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.0.5 Riemannian Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.0.6 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.1 Autoparallel and Geodesic Curves . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 Universal Connections and Curvature . . . . . . . . . . . . . . . . . . . . . . 29

3 Information Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 Fisher Information Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Exponential Family of Probability Density Functions . . . . . . . . . 333.3 Statistical a-Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Affine Immersions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.1 Weibull Distributions: Not of Exponential Type . . . . . . . 36

VII

Page 8: Information Geometry: Near Randomness and Near Independence

VIII Contents

3.5 Gamma 2-Manifold G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5.1 Gamma a-Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.2 Gamma a-Curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.3 Gamma Manifold Geodesics . . . . . . . . . . . . . . . . . . . . . . . . 403.5.4 Mutually Dual Foliations . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5.5 Gamma Affine Immersion . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Log-Gamma 2-Manifold L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6.1 Log-Gamma Random Walks . . . . . . . . . . . . . . . . . . . . . . . . 45

3.7 Gaussian 2-Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.7.1 Gaussian Natural Coordinates . . . . . . . . . . . . . . . . . . . . . . 473.7.2 Gaussian Information Metric . . . . . . . . . . . . . . . . . . . . . . . . 473.7.3 Gaussian Mutually Dual Foliations . . . . . . . . . . . . . . . . . . 483.7.4 Gaussian Affine Immersions . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.8 Gaussian a-Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.8.1 Gaussian a-Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.8.2 Gaussian a-Curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.9 Gaussian Mutually Dual Foliations . . . . . . . . . . . . . . . . . . . . . . . . 503.10 Gaussian Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.10.1 Central Mean Submanifold . . . . . . . . . . . . . . . . . . . . . . . . . 513.10.2 Unit Variance Submanifold . . . . . . . . . . . . . . . . . . . . . . . . . 523.10.3 Unit Coefficient of Variation Submanifold . . . . . . . . . . . . 52

3.11 Gaussian Affine Immersions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.12 Log-Gaussian Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Information Geometry of Bivariate Families . . . . . . . . . . . . . . . 554.1 McKay Bivariate Gamma 3-Manifold M . . . . . . . . . . . . . . . . . . . . 554.2 McKay Manifold Geometry in Natural Coordinates . . . . . . . . . . 584.3 McKay Densities Have Exponential Type . . . . . . . . . . . . . . . . . . . 59

4.3.1 McKay Information Metric . . . . . . . . . . . . . . . . . . . . . . . . . 594.4 McKay a-Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4.1 McKay a-Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4.2 McKay a-Curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.5 McKay Mutually Dual Foliations . . . . . . . . . . . . . . . . . . . . . . . . . . 644.6 McKay Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6.1 Submanifold M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.6.2 Submanifold M2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.6.3 Submanifold M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.7 McKay Bivariate Log-Gamma Manifold ˜M . . . . . . . . . . . . . . . . . 714.8 Generalized McKay 5-Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.8.1 Bivariate 3-Parameter Gamma Densities . . . . . . . . . . . . . 724.8.2 Generalized McKay Information Metric . . . . . . . . . . . . . . 73

4.9 Freund Bivariate Exponential 4-Manifold F . . . . . . . . . . . . . . . . . 744.9.1 Freund Fisher Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.10 Freund Natural Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Page 9: Information Geometry: Near Randomness and Near Independence

Contents IX

4.11 Freund a-Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.11.1 Freund a-Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.11.2 Freund a-Curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.12 Freund Foliations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.13 Freund Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.13.1 Independence Submanifold F1 . . . . . . . . . . . . . . . . . . . . . . . 814.13.2 Submanifold F2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.13.3 Submanifold F3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.13.4 Submanifold F4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.14 Freund Affine Immersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.15 Freund Bivariate Log-Exponential Manifold . . . . . . . . . . . . . . . . . 874.16 Bivariate Gaussian 5-Manifold N . . . . . . . . . . . . . . . . . . . . . . . . . . 884.17 Bivariate Gaussian Fisher Information Metric . . . . . . . . . . . . . . . 894.18 Bivariate Gaussian Natural Coordinates . . . . . . . . . . . . . . . . . . . . 904.19 Bivariate Gaussian a-Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.19.1 a-Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.19.2 a-Curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.20 Bivariate Gaussian Foliations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.21 Bivariate Gaussian Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.21.1 Independence Submanifold N1 . . . . . . . . . . . . . . . . . . . . . . 994.21.2 Identical Marginals Submanifold N2 . . . . . . . . . . . . . . . . . 1014.21.3 Central Mean Submanifold N3 . . . . . . . . . . . . . . . . . . . . . . 1034.21.4 Affine Immersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.22 Bivariate Log-Gaussian Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5 Neighbourhoods of Poisson Randomness, Independence,and Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.1 Gamma Manifold G and Neighbourhoods

of Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.2 Log-Gamma Manifold L and Neighbourhoods of Uniformity . . 1115.3 Freund Manifold F and Neighbourhoods of Independence . . . . . 112

5.3.1 Freund Submanifold F2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.4 Neighbourhoods of Independence for Gaussians . . . . . . . . . . . . . . 114

6 Cosmological Voids and Galactic Clustering . . . . . . . . . . . . . . . 1196.1 Spatial Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.2 Galactic Cluster Spatial Processes . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3 Cosmological Voids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4 Modelling Statistics of Cosmological Void Sizes . . . . . . . . . . . . . . 1266.5 Coupling Galaxy Clustering and Void Sizes . . . . . . . . . . . . . . . . . 1306.6 Representation of Cosmic Evolution . . . . . . . . . . . . . . . . . . . . . . . 132

7 Amino Acid ClusteringWith A.J. Doig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.1 Spacings of Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.2 Poisson Spaced Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Page 10: Information Geometry: Near Randomness and Near Independence

X Contents

7.3 Non-Poisson Sequences as Gamma Processes . . . . . . . . . . . . . . . . 1427.3.1 Local Geodesic Distance Approximations . . . . . . . . . . . . . 145

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.5 Why Would Amino Acids Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . 151

8 Cryptographic Attacks and Signal Clustering . . . . . . . . . . . . . . 1538.1 Cryptographic Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1538.2 Information Geometry of the Log-gamma Manifold . . . . . . . . . . 1548.3 Distinguishing Nearby Unimodular Distributions . . . . . . . . . . . . 1558.4 Difference From a Uniform Distribution . . . . . . . . . . . . . . . . . . . . 1578.5 Gamma Distribution Neighbourhoods

of Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9 Stochastic Fibre NetworksWith W.W. Sampson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1619.1 Random Fibre Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1619.2 Random Networks of Rectangular Fibres . . . . . . . . . . . . . . . . . . . 1649.3 Log-Gamma Information Geometry for Fibre Clustering . . . . . . 1689.4 Bivariate Gamma Distributions for Anisotropy . . . . . . . . . . . . . . 1699.5 Independent Polygon Sides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

9.5.1 Multiplanar Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759.6 Correlated Polygon Sides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.6.1 McKay Bivariate Gamma Distribution . . . . . . . . . . . . . . . 1829.6.2 McKay Information Geometry . . . . . . . . . . . . . . . . . . . . . . 1849.6.3 McKay Information Entropy . . . . . . . . . . . . . . . . . . . . . . . . 1889.6.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

10 Stochastic Porous Media and HydrologyWith J. Scharcanski and S. Felipussi . . . . . . . . . . . . . . . . . . . . . . . . . 19510.1 Hydrological Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19510.2 Univariate Gamma Distributions and Randomness . . . . . . . . . . . 19610.3 Mckay Bivariate Gamma 3-Manifold . . . . . . . . . . . . . . . . . . . . . . . 19610.4 Distance Approximations in the McKay Manifold . . . . . . . . . . . . 19810.5 Modelling Stochastic Porous Media . . . . . . . . . . . . . . . . . . . . . . . . 200

10.5.1 Adaptive Tomographic Image Segmentation . . . . . . . . . . 20110.5.2 Mathematical Morphology Concepts . . . . . . . . . . . . . . . . . 20310.5.3 Adaptive Image Segmentation and Representation . . . . . 20910.5.4 Soil Tomographic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

11 Quantum Chaology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22311.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22311.2 Eigenvalues of Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 22611.3 Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Page 11: Information Geometry: Near Randomness and Near Independence

1

Mathematical Statisticsand Information Theory

There are many easily found good books on probability theory and math-ematical statistics (eg [84, 85, 87, 117, 120, 122, 196]), stochastic processes(eg [31, 161]) and information theory (eg [175, 176]); here we just outlinesome topics to help make the sequel more self contained. For those who haveaccess to the computer algebra package Mathematica [215], the approach tomathematical statistics and accompanying software in Rose and Smith [177]will be particularly helpful.

The word stochastic comes from the Greek stochastikos, meaning skillfulin aiming and stochazesthai to aim at or guess at, and stochos means target oraim. In our context, stochastic colloquially means involving chance variationsaround some event—rather like the variation in positions of strikes aimed ata target. In its turn, the later word statistics comes through eighteenth cen-tury German from the Latin root status meaning state; originally it meantthe study of political facts and figures. The noun random was used in thesixteenth century to mean a haphazard course, from the Germanic randir torun, and as an adjective to mean without a definite aim, rule or method, theopposite of purposive. From the middle of the last century, the concept of arandom variable has been used to describe a variable that is a function of theresult of a well-defined statistical experiment in which each possible outcomehas a definite probability of occurrence. The organization of probabilities ofoutcomes is achieved by means of a probability function for discrete randomvariables and by means of a probability density function for continuous ran-dom variables. The result of throwing two fair dice and summing what theyshow is a discrete random variable.

Mainly, we are concerned with continuous random variables (here mea-surable functions defined on some R

n) with smoothly differentiable probabil-ity density measure functions, but we do need also to mention the Poissondistribution for the discrete case. However, since the Poisson is a limitingapproximation to the Binomial distribution which arises from the Bernoullidistribution (which everyone encountered in school!) we mention also thoseexamples.

K. Arwini, C.T.J. Dodson, Information Geometry. 1Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 12: Information Geometry: Near Randomness and Near Independence

2 1 Mathematical Statistics and Information Theory

1.1 Probability Functions for Discrete Variables

For discrete random variables we take the domain set to be N∪ 0. We mayview a probability function as a subadditive measure function of unit weighton N ∪ 0

p : N ∪ 0 → [0, 1) (nonnegativity) (1.1)∞∑

k=0

p(k) = 1 (unit weight) (1.2)

p(A ∪B) ≤ p(A) + p(B), ∀A,B ⊂ N ∪ 0, (subadditivity) (1.3)with equality ⇐⇒ A ∩B = ∅.

Formally, we have a discrete measure space of total measure 1 with σ-algebrathe power set and measure function induced by p

sub(N ∪ 0) → [0, 1) : A →∑

k∈A

p(k)

and as we have anticipated above, we usually abbreviate∑

k∈A p(k) = p(A).We have the following expected values of the random variable and its

square

E(k) = k =∞∑

k=0

k p(k) (1.4)

E(k2) = k2 =∞∑

k=0

k2 p(k). (1.5)

Formally, statisticians are careful to distinguish between a property of thewhole population—such as these expected values—and the observed valuesof samples from the population. In practical applications it is quite commonto use the bar notation for expectations and we shall be clear when we arehandling sample quantities. With slight but common abuse of notation, we call

k the mean, k2− (k)2 the variance, σk = +√

k2 − (k)2 the standard deviationand σk/k the coefficient of variation, respectively, of the random variable k.The variance is the square of the standard deviation.

The moment generating function Ψ(t) = E(etX), t ∈ R of a distributiongenerates the rth moment as the value of the rth derivative of Ψ evaluated att = 0. Hence, in particular, the mean and variance are given by:

E(X) = Ψ ′(0) (1.6)V ar(X) = Ψ ′′(0) − (Ψ ′(0))2, (1.7)

which can provide an easier method for their computation in some cases.

Page 13: Information Geometry: Near Randomness and Near Independence

1.1 Probability Functions for Discrete Variables 3

1.1.1 Bernoulli Distribution

It is said that a random variable X has a Bernoulli distribution with parameterp (0 ≤ p ≤ 1) if X can take only the values 0 and 1 and the probabilities are

Pr(X = 1) = p (1.8)

Pr(X = 0) = 1 − p (1.9)

Then the probability function of X can be written as follows:

f(x|p) =

px(1 − p)1−x if x = 0, 10 otherwise (1.10)

If X has a Bernoulli distribution with parameter p, then we can find itsexpectation or mean value E(X) and variance V ar(X) as follows.

E(X) = 1 · p + 0 · (1 − p) = p (1.11)

V ar(X) = E(X2) − (E(X))2 = p− p2 (1.12)

The moment generating function of X is the expectation of etX ,

Ψ(t) = E(etX) = pet + q (1.13)

which is finite for all real t.

1.1.2 Binomial Distribution

If n random variables X1,X2, . . . , Xn are independently identically distrib-uted, and each has a Bernoulli distribution with parameter p, then it is saidthat the variables X1,X2, . . . , Xn form n Bernoulli trials with parameter p.

If the random variables X1,X2, . . . , Xn form n Bernoulli trials with para-meter p and if X = X1 + X2 + . . . + Xn, then X has a binomial distributionwith parameters n and p.

The binomial distribution is of fundamental importance in probability andstatistics because of the following result for any experiment which can haveoutcome only either success or failure. The experiment is performed n timesindependently and the probability of the success of any given performance is p.If X denotes the total number of successes in the n performances, then X hasa binomial distribution with parameters n and p. The probability function ofX is:

P (X = r) = P (n∑

i=1

Xi = r) =(

nr

)

pr(1 − p)n−r (1.14)

where r = 0, 1, 2, . . . , n.

Page 14: Information Geometry: Near Randomness and Near Independence

4 1 Mathematical Statistics and Information Theory

We write

f(r|p) =

(

nr

)

pr(1 − p)n−r if r=0, 1, 2, . . . , n

0 otherwise(1.15)

In this distribution n must be a positive integer and p must lie in the interval0 ≤ p ≤ 1. If X is represented by the sum of n Bernoulli trials, then it is easyto get its expectation, variance and moment generating function by using theproperties of sums of independent random variables—cf. §1.3.

E(X) =n∑

i=1

E(Xi) = np (1.16)

V ar(X) =n∑

i=1

V ar(Xi) = np(1 − p) (1.17)

Ψ(t) = E(etX) =n∏

i=1

E(etXi) = (pet + q)n. (1.18)

1.1.3 Poisson Distribution

The Poisson distribution is widely discussed in the statistical literature; onemonograph devoted to it and its applications is Haight [102].

Take t, τ ∈ (0,∞)

p : N ∪ 0 → [0, 1) : k →(

t

τ

)k 1k!

e−t/τ (1.19)

k = t/τ (1.20)

σk = t/τ. (1.21)

This probability function is used to model the number k of events in aregion of measure t when the mean number of events per unit region is τ andthe probability of an event occurring in a region depends only on the measureof the region, not its shape or location. Colloquially, in applications it is verycommon to encounter the usage of ‘random’ to mean the specific case of aPoisson process; formally in statistics the term random has a more generalmeaning: probabilistic, that is dependent on random variables. Figure 1.1depicts a simulation of a ‘random’ array of 2000 line segments in a plane; thecentres of the lines follow a Poisson process and the orientations of the linesfollow a uniform distribution, cf. §1.2.1. So, in an intuitive sense, this is theresult of the least choice, or maximum uncertainty, in the disposition of theseline segments: the centre of each line segment is equally likely to fall in everyregion of given area and its angle of axis orientation is equally likely to fall inevery interval of angles of fixed size. This kind of situation is representative

Page 15: Information Geometry: Near Randomness and Near Independence

1.1 Probability Functions for Discrete Variables 5

Fig. 1.1. Simulation of a random array of 2000 line segments in a plane; the centresof the lines follow a Poisson process and the orientations of the lines follow a uniformdistribution. The grey tones correspond to order of deposition.

of common usage of the term ‘random process’ to mean subordinate to aPoisson process. A ‘non-random’ processes departs from Poisson by havingconstraints on the probabilities of placing of events or objects, typically as aresult of external influence or of interactions among events or objects.

Importantly, the Poisson distribution can give a good approximation tothe binomial distribution when n is large and p is close to 0. This is easy tosee by making the correspondences:

e−pn −→ (1 − (n− r)p) (1.22)n!/(n− r)! −→ nr. (1.23)

Much of this monograph is concerned with the representation and classifi-cation of deviations from processes subordinate to a Poisson random variable,for example for a line process via the distribution of inter-event (nearest neigh-bour, or inter-incident) spacings. Such processes arise in statistics under theterm renewal process [150].

We shall see in Chapter 9 that, for physical realisations of stochastic fibrenetworks, typical deviations from Poisson behaviour arise when the centres of

Page 16: Information Geometry: Near Randomness and Near Independence

6 1 Mathematical Statistics and Information Theory

the fibres tend to cluster, Figure 9.1, or when the orientations of their axeshave preferential directions, Figure 9.15. Radiographs of real stochastic fibrenetworks are shown in Figure 9.3 from Oba [156]; the top network consists offibres deposited approximately according to a Poisson planar process whereasin the lower networks the fibres have tended to cluster to differing extents.

1.2 Probability Density Functions for ContinuousVariables

We are usually concerned with the case of continuous random variables definedon some Ω ⊆ R

m. For our present purposes we may view a probability densityfunction (pdf) on Ω ⊆ R

m as a subadditive measure function of unit weight,namely, a nonnegative map on Ω

f : Ω → [0,∞) (nonnegativity) (1.24)∫

Ω

f = f(Ω) = 1 (unit weight) (1.25)

f(A ∪B) ≤ f(A) + f(B), ∀A,B ⊂ Ω, (subadditivity) (1.26)

with equality ⇐⇒ A ∩B = ∅.

Formally, we have a measure space of total measure 1 with σ-algebra typicallythe Borel sets or the power set and the measure function induced by f

sub(Ω) → [0, 1] : A →∫

A

f = integral of f over A

and as we have anticipated above, we usually abbreviate∫

Af = f(A). Given

an integrable (ie measurable in the σ-algebra) function u : Ω → R, theexpectation or mean value of u is defined to be

E(u) = u =∫

Ω

uf.

We say that f is the joint pdf for the random variables x1, x2, . . . , xm, be-ing the coordinates of points in Ω, or that these random variables have thejoint probability distribution f. If x is one of these random variables, and inparticular for the important case of a single random variable x, we have thefollowing

x =∫

Ω

xf (1.27)

x2 =∫

Ω

x2f. (1.28)

Page 17: Information Geometry: Near Randomness and Near Independence

1.2 Probability Density Functions for Continuous Variables 7

Again with slight abuse of notation, we call x the mean and the variance isthe mean square deviation

σ2x = (x− x)2 = x2 − (x)2.

Its square root is the standard deviation σx = +√

x2 − (x)2 and the ratioσx/x is the coefficient of variation, of the random variable x. Some inequalitiesfor the probability of a random variable exceeding a given value are worthmentioning.

Markov’s Inequality: If x is a nonnegative random variable with probabil-ity density function f then for all a > 0, the probability that x > a is

∫ ∞

a

f ≤ x

a. (1.29)

Chebyshev’s Inequality: If x is a random variable having probability den-sity function f with zero mean and finite variance σ2, then for all a > 0,the probability that x > a is

∫ ∞

a

f ≤ σ2

σ2 + a2. (1.30)

Bienayme-Chebyshev’s Inequality: If x is a random variable havingprobability density function f and u is a nonnegative non-decreasingfunction on (0,∞), then for all a > 0 the probability that |x| > a is

1 −∫ a

−a

f ≤ u

u(a). (1.31)

The cumulative distribution function (cdf) of a nonnegative random variable xwith probability density function f is the function defined by

F : [0,∞) → [0, 1] : x →∫ x

0

f(t) dt. (1.32)

It is easily seen that if we wish to change from random variable x withdensity function f to a new random variable ξ when x is given as an invertiblefunction of ξ, then the probability density function for ξ is represented by

g(ξ) = f(x(ξ))∣

dx

. (1.33)

If independent real random variables x and y have probability density func-tions f, g respectively, then the probability density function h of their sumz = x + y is given by

h(z) =∫ ∞

−∞f(x) g(z − x) dx (1.34)

Page 18: Information Geometry: Near Randomness and Near Independence

8 1 Mathematical Statistics and Information Theory

and the probability density function p of their product r = xy is given by

p(r) =∫ ∞

−∞f(x) g

( r

x

) 1|x|dx. (1.35)

Usually, a probability density function depends on a set of parameters,θ1, θ2, . . . , θn and we say that we have an n-dimensional family. Then thecorresponding change of variables formula involves the n× n Jacobian deter-minant for the multiple integrals, so generalizing (1.33).

1.2.1 Uniform Distribution

This is the simplest continuous distribution, with constant probability densityfunction for a bounded random variable:

u : [a, b] → [0,∞) : x → 1b− a

(1.36)

x =a + b

2(1.37)

σx =b− a

2√

3. (1.38)

The probability of an event occurring in an interval [α, β] [a, b] is simplyproportional to the length of the interval:

P (x ∈ [α, β]) =β − α

b− a.

1.2.2 Exponential Distribution

Take λ ∈ R+; this is called the parameter of the exponential probability

density function

f : [0,∞) → [0,∞) : [a, b] →∫

[a,b]

1λe−x/λ (1.39)

x = λ (1.40)σx = λ. (1.41)

The parameter space of the exponential distribution is R+, so exponential

distributions form a 1-parameter family. In the sequel we shall see that quitegenerally we may provide a Riemannian structure to the parameter space ofa family of distributions. Sometimes we call a family of pdfs a parametricstatistical model.

Observe that, in the Poisson probability function (1.19) for events on thereal line, the probability of zero zero events in an interval t is

p(0) = e−t/τ

Page 19: Information Geometry: Near Randomness and Near Independence

1.3 Joint Probability Density Functions 9

and it is not difficult to show that the probability density function for thePoisson inter-event (or inter-incident) distance t on [0,∞) is an exponentialprobability density function (1.39) given by

f : [0,∞) → [0,∞) : t → 1τe−t/τ

where τ is the mean number of events per unit interval. Thus, the occurrenceof an exponential distribution has associated with it a complementary Poissondistribution, so the exponential distribution provides for continuous variablesan identifier for Poisson processes. Correspondingly, departures from an ex-ponential distribution correspond to departures from a Poisson process. Weshall see below in §1.4.1 that in rather a strict sense the gamma distributiongeneralises the exponential distribution.

1.2.3 Gaussian, or Normal Distribution

This has real random variable x with mean µ and variance σ2 and the familiarbell-shaped probability density function given by

f(x) =1√2πσ

e−(x−µ)2

2σ2 . (1.42)

The Gaussian distribution has the following uniqueness property: For indepen-dent random variables x1, x2, . . . , xn with a common continuous probabilitydensity function f, having independence of the sample mean x and samplestandard deviation S is equivalent to f being a Gaussian distribution [110].

The Central Limit Theorem states that for independent and identicallydistributed real random variables xi each having mean µ and variance σ2, therandom variable

w =(x1 + x2 + . . . + xn) − nµ√

nσ(1.43)

tends as n → ∞ to a Gaussian random variable with mean zero and unitvariance.

1.3 Joint Probability Density Functions

Let f be a probability density function, defined on R2 (or some subset thereof).

This is an important case since here we have two variables, X,Y, say, and wecan extract certain features of how they interact. In particular, we define theirrespective mean values and their covariance, σxy:

x =∫ ∞

−∞

∫ ∞

−∞x f(x, y) dxdy (1.44)

y =∫ ∞

−∞

∫ ∞

−∞y f(x, y) dxdy (1.45)

σxy =∫ ∞

−∞

∫ ∞

−∞xy f(x, y) dxdy − x y = xy − x y. (1.46)

Page 20: Information Geometry: Near Randomness and Near Independence

10 1 Mathematical Statistics and Information Theory

The marginal probability density function of X is fX , obtained by inte-grating f over all y,

fX(x) =∫ ∞

v=−∞fX,Y (x, v) dv (1.47)

and similarly the marginal probability density function of Y is

fY (y) =∫ ∞

u=−∞fX,Y (u, y) du (1.48)

The jointly distributed random variables X and Y are called independentif their marginal density functions satisfy

fX,Y (x, y) = fX(x)fY (y) for all x, y ∈ R (1.49)

It is easily shown that if the variables are independent then their covariance(1.46) is zero but the converse is not true. Feller [84] gives a simple coun-terexample: let X take values −1,+1,−2,+2, each with probability 1

4 andlet Y = X2; then the covariance is zero but there is evidently a (nonlinear)dependence.

The extent of dependence between two random variables can be measuredin a normalised way by means of the correlation coefficient: the ratio of thecovariance to the product of marginal standard deviations:

ρxy =σxy

σxσy. (1.50)

Note that by the Cauchy-Schwartz inequality, −1 ≤ ρxy ≤ 1, whenever itexists, the limiting values corresponding to the case of linear dependence be-tween the variables. Intuitively, ρxy < 0 if y tends to increase as x decreases,and ρxy > 0 if x and y tend to increase together.

A change of random variables from (x, y) with density function f to say(u, v) with density function g and x, y given as invertible functions of u, vinvolves the Jacobian determinant:

g(u, v) = f(x(u, v), y(u, v))∂(x, y)∂(u, v)

. (1.51)

1.3.1 Bivariate Gaussian Distributions

The probability density function of the two-dimensional Gaussian distributionhas the form:

f(x, y) =1

2π√σ1 σ2 − σ12

2eW (1.52)

Page 21: Information Geometry: Near Randomness and Near Independence

1.4 Information Theory 11

with

W = − 1

2 (σ1 σ2 − σ122)

(

σ2(x − µ1)2 − 2 σ12 (x − µ1) (y − µ2) + σ1(y − µ2)

2)

,

where

−∞ < x1 < x2 < ∞, −∞ < µ1 < µ2 < ∞, 0 < σ1, σ2 < ∞.

This contains the five parameters (µ1, µ2, σ1, σ12, σ2) = (ξ1, ξ2, ξ3, ξ4, ξ5) ∈ Θ.So we have a five-dimensional parameter space Θ.

1.4 Information Theory

Information theory owes its origin in the 1940s to Shannon [186], whose in-terest was in modelling the transfer of information stored in the form ofbinary on-off devices, the basic unit of information being one bit: 0 or 1.The theory provided a representation for the corruption by random electronicnoise of transferred information streams, and for quantifying the effective-ness of error-correcting algorithms by the incorporation of redundancy in thetransfer process. His concept of information theoretic entropy in communica-tion theory owed its origins to thermodynamics but its effectiveness in generalinformation systems has been far reaching. Information theory worked out bythe communication theorists, and entropy in particular, were important inproviding a conceptual and mathematical framework for the development ofchaos theory [93]. There the need was to model the dynamics of adding smallextrinsic noise to otherwise deterministic systems. In physical theory, entropyprovides the uni-directional ‘arrow of time’ by measuring the disorder in anirreversible system [164]. Intuitively, we can see how the entropy of a statemodelled by a point in a space of probability density functions would be ex-pected to be maximized at a density function that represented as nearly aspossible total disorder, colloquially, randomness.

Shannon [186] considered an information source that generates symbolsfrom a finite set xi|i = 1, 2, · · ·n and transmits them as a stationary sto-chastic process. He defined the ‘entropy’ function for the process in terms ofthe probabilities pi|i = 1, 2, · · ·n for generation of the different symbols:

S = −i=n∑

i=1

pi log(pi). (1.53)

This entropy (1.53) is essentially the same as that of Gibbs and Boltzmann instatistical mechanics but here it is viewed as a measure of the ‘uncertainty’ inthe process; for example S is greater than or equal to the entropy conditionedby the knowledge of a second random variable. If the above symbols are gener-ated mutually independently, then S is a measure of the amount of information

Page 22: Information Geometry: Near Randomness and Near Independence

12 1 Mathematical Statistics and Information Theory

available in the source for transmission. If the symbols in a sequence are notmutually independently generated, Shannon introduced the information ‘ca-pacity’ of the transmission process as C = limT→∞ logN(T )/T , where N(T )is the maximum number of sequences of symbols that can be transmitted intime T. It follows that, for given entropy S and capacity C, the symbols canbe encoded in such a way that C

S−ε symbols per second can be transmittedover the channel if ε > 0 but not if ε < 0. So again, we have a maximumprinciple from entropy.

Given a set of observed values < gα(x) > for functions gα of the randomvariable x, we seek a ‘least prejudiced’ set of probability values for x on theassumption that it can take only a finite number of values, xi with probabilitiesp1, p2, · · · , pn such that

< gα(x) > =i=n∑

i=1

pi gα(xi) for α = 1, 2, . . . , N (1.54)

1 =i=n∑

i=1

pi. (1.55)

Jaynes [107], a strong proponent of Shannon’s approach, showed that thisoccurs if we choose those pi that maximize Shannon’s entropy function (1.53).In the case of a continuous random variable x ∈ R with probability density pparametrized by a finite set of parameters, the entropy becomes an integraland the maximizing principle is applied over the space of parameters, as weshall see below.

It turns out [201] that if we have no data on observed functions of x, (sothe set of equations (1.54) is empty) then the maximum entropy choice givesthe exponential distribution. If we have estimates of the first two momentsof the distribution of x, then we obtain the (truncated) Gaussian. If we haveestimates of the mean and mean logarithm of x, then the maximum entropychoice is the gamma distribution.

Jaynes [107] provided the foundation for information theoretic methods in,among other things, Bayes hypothesis testing—cf. Tribus et al. [200, 201]. Formore theory, see also Slepian [190] and Roman [175, 176]. It is fair to pointout that in the view of some statisticians, the applicability of the maximumentropy approach has been overstated; we mention for example the reserva-tions of Ripley [173] in the case of statistical inference for spatial Gaussianprocesses.

In the sequel we shall consider the particular case of the gamma distribu-tion for several reasons:

• the exponential distributions form a subclass of gamma distributions andexponential distributions represent Poisson inter-event distances

• the sum of n independent identical exponential random variables followsa gamma distribution

Page 23: Information Geometry: Near Randomness and Near Independence

1.4 Information Theory 13

• the sum of n independent identical gamma random variables follows agamma distribution

• lognormal distributions may be well-approximated by gamma distributions• products of gamma distributions are well-approximated by gamma distri-

butions• stochastic porous media have been modelled using gamma distribu-

tions [72].

Other parametric statistical models based on different distributions may betreated in a similar way. Our particular interest in the gamma distributionand a bivariate gamma distribution stems from the fact that the exponentialdistribution is a special case and that corresponds to the standard model foran underlying Poisson process.

Let Θ be the parameter space of a parametric statistical model, that is ann-dimensional smooth family of probability density functions defined on somefixed event space Ω of unit measure,

Ω

pθ = 1 for all θ ∈ Θ.

For each sequence X = X1,X2, . . . , Xn, of independent identically distrib-uted observed values, the likelihood function likX on Θ which measures thelikelihood of the sequence arising from different pθ ∈ S is defined by

likX : Θ → [0, 1] : θ →n∏

i=1

pθ(Xi).

Statisticians use the likelihood function, or log-likelihood its logarithm l =log lik, in the evaluation of goodness of fit of statistical models. The so-called‘method of maximum likelihood’, or ‘maximum entropy’ in Shannon’s terms,is used to obtain optimal fitting of the parameters in a distribution to observeddata.

1.4.1 Gamma Distribution

The family of gamma distributions is very widely used in applications withevent space Ω = R

+. It has probability density functions given by

Θ ≡ f(x; γ, κ)|γ, κ ∈ R+

so here Θ = R+ × R

+ and the random variable is x ∈ Ω = R+ with

f(x; γ, κ) =(

κ

γ

)κxκ−1

Γ (κ)e−xκ/γ (1.56)

Then x = γ and V ar(x) = γ2/κ and we see that γ controls the mean of thedistribution while κ controls its variance and hence the shape. Indeed, the

Page 24: Information Geometry: Near Randomness and Near Independence

14 1 Mathematical Statistics and Information Theory

property that the variance is proportional to the square of the mean, §1.2,actually characterizes gamma distributions as shown recently by Hwang andHu [106] (cf. their concluding remark).

Theorem 1.1 (Hwang and Hu [106]). For independent positive randomvariables with a common probability density function f, having independenceof the sample mean and the sample coefficient of variation is equivalent to fbeing the gamma distribution.

The special case κ = 1 in (1.56) corresponds to the situation of the random orPoisson process along a line with mean inter-event interval γ, then the distri-bution of inter-event intervals is exponential. In fact, the gamma distributionhas an essential generalizing property of the exponential distribution since itrepresents inter-event distances for generalizations of the Poisson process toa ‘censored’ Poisson process. Precisely, for integer κ = 1, 2, . . . , (1.56) mod-els a process that is Poisson but with intermediate events removed to leaveonly every κth. Formally, the gamma distribution is the κ-fold convolutionof the exponential distribution, called also the Pearson Type III distribution.The Chi-square distribution with n = 2κ degrees of freedom models the dis-tribution of a sum of squares of n independent random variables all havingthe Gaussian distribution with zero mean and standard deviation σ; this is agamma distribution with mean γ = nσ2 if κ = 1, 2, . . . . Figure 1.2 shows afamily of gamma distributions, all of unit mean, with κ = 1

2 , 1, 2, 5.

0.25 0.5 0.75 1 1.25 1.5 1.75 2

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6f(x; 1, κ)

κ = 12

(Clustered)

κ = 1 (Poisson)

κ = 2 (Smoothed)

κ = 5 (More Smoothed)

Inter-event interval x

Fig. 1.2. Probability density functions, f(x; γ, κ), for gamma distributions of inter-event intervals x with unit mean γ = 1, and κ = 1

2, 1, 2, 5. The case κ = 1

corresponds to an exponential distribution from an underlying Poisson process. Someorganization—clustering (κ < 1) or smoothing (κ > 1)—is represented by κ = 1.

Page 25: Information Geometry: Near Randomness and Near Independence

1.4 Information Theory 15

Shannon’s information theoretic entropy or ‘uncertainty’ is given, up to afactor, by the negative of the expectation of the logarithm of the probabilitydensity function (1.56), that is

Sf (γ, κ) = −∫ ∞

0

log(f(x; γ, κ)) f(x; γ, κ) dx

= κ + (1 − κ)Γ ′(κ)Γ (κ)

+ logγ Γ (κ)

κ. (1.57)

Part of the entropy function (1.57) is depicted in Figure 1.3 as a contour plot.At unit mean, the maximum entropy (or maximum uncertainty) occurs at

κ = 1, which is the random case, and then Sf (γ, 1) = 1 + log γ. So, a Poissonprocess of points on a line is such that the points are as disorderly as possibleand among all homogeneous point processes with a given density, the Poissonprocess has maximum entropy. Figure 1.4 shows a plot of Sf (γ, κ), for thecase of unit mean γ = 1. Figure 1.5 shows some integral curves of the entropygradient field in the space of gamma probability density functions.

We can see the role of the log-likelihood function in the case of a setX = X1,X2, . . . , Xn of measurements, drawn from independent identically

0 1 2 3 4 50.0

0.5

1.0

1.5

2.0

γ

κ

Fig. 1.3. Contour plot of information theoretic entropy Sf (γ, κ), for gamma distri-butions from (1.57). The cases with κ = 1 correspond to exponential distributionsrelated to underlying Poisson processes.

Page 26: Information Geometry: Near Randomness and Near Independence

16 1 Mathematical Statistics and Information Theory

2 4 6 8 10 12 14

-2.5

-2

-1.5

-1

-0.5

0.5

1Entropy

Sf (1, κ)

κ

Fig. 1.4. Information theoretic entropy Sf (γ, κ), for gamma distributions of inter-event intervals with unit mean γ = 1. The maximum at κ = 1 corresponds to anexponential distribution from an underlying Poisson process. The regime κ < 1corresponds to clustering of events and κ > 1 corresponds to smoothing out ofevents, relative to a Poisson process. Note that, at constant mean, the variance of xdecays like 1/κ.

0 1 2 3 4 50

0.5

1

1.5

γ

Fig. 1.5. A selection of integral curves of the entropy gradient field for gammaprobability density functions, with initial points having small values of γ. The caseswith κ = 1 correspond to exponential distributions related to underlying Poissonprocesses.

Page 27: Information Geometry: Near Randomness and Near Independence

1.4 Information Theory 17

distributed random variables, to which we wish to fit the maximum likelihoodgamma distribution. The procedure to optimize the choice of γ, κ is as follows.For independent events Xi, with identical distribution p(x; γ, κ), their jointprobability density is the product of the marginal densities so a measure ofthe ‘likelihood’ of finding such a set of events is

likX(γ, κ) =n∏

i=1

f(Xi; γ, κ).

0.00

0.05

0.10

0.15

Fig. 1.6. Probability histogram plot with unit mean for the spacings between thefirst 100, 000 prime numbers and the maximum likelihood gamma fit, κ = 1.09452,(large points).

0.00

0.05

0.10

0.15

Fig. 1.7. Probability histogram plot with unit mean for the spacings between thefirst 100, 000 prime numbers and the gamma distribution having the same variance,so κ = 1.50788, (large points).

Page 28: Information Geometry: Near Randomness and Near Independence

18 1 Mathematical Statistics and Information Theory

We seek a choice of γ, κ to maximize this product and since the log functionis monotonic increasing it is simpler to maximize the logarithm

lX(γ, κ) = log likX(γ, κ) = log[n∏

i=1

f(Xi; γ, κ)].

Substitution gives us

lX(γ, κ) =n∑

i=1

[κ(log κ− log γ) + (κ− 1) logXi −κ

γXi − logΓ (κ)]

= nκ(log κ− log γ) + (κ− 1)n∑

i=1

logXi −κ

γ

n∑

i=1

Xi − n logΓ (κ).

Then, solving for ∂γ lX(γ, κ) = ∂κlX(γ, κ) = 0 in terms of properties of theXi, we obtain the maximum likelihood estimates γ, κ of γ, κ in terms of themean and mean logarithm of the Xi

γ = X =1n

n∑

i=1

Xi

log κ− Γ ′(κ)Γ (κ)

= logX − log X

where logX = 1n

∑ni=1 logXi.

For example, the frequency distribution of spacings between the first100, 000 prime numbers has mean approximately 13.0, and variance 112,and 99% of the probability is achieved by spacings up to 4 times themean. Figure 1.6 shows the maximum likelihood fit gamma distribution withκ = 1.09452, as points, on the probability histogram of the prime spacings nor-malized to unit mean; the range of the abscissa is 4 times the mean. Figure 1.7shows as points the gamma distribution with κ = 1.50788, which has the samevariance as the prime spacings normalized to unit mean. Of course, neither fitis very good and nor is the geometric distribution approximation that mightbe expected, cf. Schroeder [184] §4.12, in light of The Prime Number Theo-rem, which says that the average spacing between adjacent primes near n isapproximately log n.

Page 29: Information Geometry: Near Randomness and Near Independence

2

Introduction to Riemannian Geometry

This chapter is intended to help those with little previous exposure to differ-ential geometry by providing a rather informal summary of background forour purposes in the sequel and pointers for those who wish to pursue moregeometrical features of the spaces of probability density functions that are ourfocus in the sequel. In fact, readers who are comfortable with doing calcula-tions of curves and their arc length on surfaces in R

3 could omit this chapterat a first reading.

A topological space is the least structure that can support arguments con-cerning continuity and limits; our first experiences of such analytic propertiesis usually with the spaces R and R

n. A manifold is the least structure thatcan support arguments concerning differentiability and tangents–that is, cal-culus. Our prototype manifold is the set of points we call Euclidean n-spaceE

n which is based on the real number n-space Rn and carries the Pythagorean

distance structure. Our common experience is that a 2-dimensional Euclideanspace can be embedded in E

3, (or R3) as can curves and surfaces. Riemannian

geometry generalizes the Euclidean geometry of surfaces to higher dimensionsby handling the intrinsic properties like distances, angles and curvature inde-pendently of any environing simpler space.

We need rather little geometry of Riemannian manifolds in order to providebackground for the concepts of information geometry. Dodson and Poston [70]give an introductory treatment with many examples, Spivak [194, 195] pro-vides a six-volume treatise on Riemannian geometry while Gray [99] gavevery detailed descriptions and computer algebraic procedures using Mathe-matica [215] for calculating and graphically representing most named curvesand surfaces in Euclidean E

3 and code for numerical solution of geodesicequations. Our Riemannian spaces actually will appear as subspaces of R

n soglobal properties will not be of particular significance and then the formulaeand Gray’s procedures easily generalize to more variables.

K. Arwini, C.T.J. Dodson, Information Geometry. 19Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 30: Information Geometry: Near Randomness and Near Independence

20 2 Introduction to Riemannian Geometry

2.0.2 Manifolds

A smooth n-manifold M is a (Hausdorff) topological space together with acollection of smooth maps (the charts)

φα : Uα −→ Rn | α ∈ A

from open subsets Uα of M , which satisfy:

i) Uα | α ∈ A is an open cover of M ;ii) each φα is a homeomorphism onto its image;iii) whenever Uα ∩ Uβ = ∅, then the maps between subsets of R

n

φα φ−1β : φβ(Uα ∩ Uβ) −→ φα(Uα ∩ Uβ) ,

φβ φ−1α : φα(Uα ∩ Uβ) −→ φβ(Uα ∩ Uβ) ,

have continuous derivatives of all orders (are C∞ or smooth).

We call (Uα, φα) | α ∈ A an atlas of charts for M ; the properties of M arenot significantly changed by adding more charts. The simplest example is then-manifold R

n with atlas consisting of one chart, the identity map.Intuitively, an n-manifold consists of open subsets of R

n, the φα(Uα),pasted together in a smooth fashion according to the directions given by theφα φ−1

β . For example, the unit circle S1 with its usual structure can be

presented as a 1-manifold by pasting together two open intervals, each like(−π, π). Similarly, the unit 2-sphere S

2 has an atlas consisting of two charts

(UN , φN ), (US , φS)

where UN consists of S2 with the north pole removed, US consists of S

2 withthe south pole removed, and the chart maps are stereographic projections.Thus, if S

2 is the unit sphere in R3 centered at the origin then:

φN : S2 \ n.p. −→ R

2 : (x, y, z) −→ 11 + z

(x, y)

φS : S2 \ s.p. −→ R

2 : (x, y, z) −→ 11 − z

(x, y) .

Similar chart maps work also for the higher dimensional spheres.

2.0.3 Tangent Spaces

From elementary analysis we know that the derivative of a function is a linearapproximation to that function, at the chosen point. Thus, we need vectorspaces to define linearity and these are automatically present in the form ofthe vector space R

n at each point of Euclidean point space En. At each point

x of a manifold M we construct a vector space TxM , called the tangent space

Page 31: Information Geometry: Near Randomness and Near Independence

2 Introduction to Riemannian Geometry 21

to M at x. For this we employ equivalence classes [Tφα(x)Rn] of tangent spaces

to the images of x, φα(x), under chart maps defined at x. That is, we borrowthe vector space structure from R

n via each chart (Uα, φα) with x ∈ Uα, thenidentify the borrowed copies. The result, for x ∈ S

2 embedded in R3, is simply

a vector space isomorphic to the tangent plane to S2 at x. This works here

because S2 embeds isometrically into R

3, but not all 2-manifolds embed inR

3, some need more dimensions; the Klein bottle is an example [70]. Actually,the formal construction is independent of M being embedded in this way;however, the Whitney Embedding Theorem [211] says that an embedding ofan n-manifold is always possible in R

2n+1.Once we have the tangent space TxM for each x ∈ M we can present it

in coordinates, via a choice of chart, as a copy of Rn. The derivatives of the

change of chart maps, like

∂xiβ

(φα φ−1β ) (x1

β , x2β , · · · , xn

β) ,

provide linear transformations among the representations of TxM . Next, wesay that a map between manifolds

f : M −→ N

is differentiable at x ∈ M , if for some charts (U, φ) on M and (V, ψ) on Nwith x ∈ U, f(x) ∈ V , the map

ψ f |U φ−1 : φ(U) −→ ψ(V )

is differentiable as a map between subsets of Rn and R

m, if M is an n-manifoldand N is an m-manifold. This property turns out to be independent of thechoices of charts, so we get a linear map

Txf : TxM −→ Tf(x)N .

Moreover, if we make a choice of charts then Txf appears in matrix form asthe set of partial derivatives of ψf φ−1. The notation Txf for the derivativeof f at x is precise, but in many texts it may be found abbreviated to Df ,f∗, f ′ or Tf , with or without reference to the point of application. When f isa curve in M , that is, a map from some interval

f : [0, 1] −→ M : t → f(t) ,

then Ttf is sometimes denoted by ft. This is the tangent map to f at t andthe result of its application to the standard unit vector to R at t, ft(1), is thetangent vector to f at t . It is quite common for this tangent vector also tobe abbreviated to ft.

Page 32: Information Geometry: Near Randomness and Near Independence

22 2 Introduction to Riemannian Geometry

In a natural way we can provide a topology and differential structure forthe set of all tangent vectors in all tangent spaces to an n-manifold M :

TM =⋃

x∈M

TxM ;

details are given in [70]. So, it actually turns out that TM is a 2n-manifold,called the tangent bundle to M . For example, if M = R

n then TM = Rn×R

n.Similarly, if M = S

1 with the usual structure then TM is topologically (andas a manifold) equivalent to the infinite cylinder S

1 × R. The technical termfor an n-manifold M that has a trivial product tangent bundle TM ∼= M×R

n

is parallelizable and this property is discussed in the cited texts.On the other hand, this simple situation is quite rare and it is rather a

deep result that for spheres

TSn is equivalent to S

n × Rn only for n = 1, 3, 7 .

For other spheres, their tangent bundles consist of twisted products of copiesof R

n over Sn. In particular, TS

2 is such a twisted product of S2 with one

copy of R2 at each point. An intuitive picture of a 2-manifold that is a twisted

product of R1 (or an interval from it) over S

1 is a Mobius strip, which weknow does not embed into R

2 but does embed into R3.

A map f : M → N between manifolds is just called differentiable if itis differentiable at every point of M , and a diffeomorphism if it is differ-entiable with a differentiable inverse; in the latter case M and N are saidto be diffeomorphic manifolds. Diffeomorphism implies homeomorphism, butnot conversely. For example, the sphere S

2 is diffeomorphic to an ellipsoid,but only homeomorphic to the surface of a cube because the latter is not asmooth manifold: it has corners and sharp edges so no well-defined tangentspace structure. We note one generalisation however, sometimes we want asmooth manifold to have a boundary. For example a circular disc obviouslycannot have its edge points homeomorphic to open sets in R

2; so we relax ourdefinition for charts to allow the chart maps to patch together open subsetslike (x, y) ∈ R

2|0 < x ≤ 1, 0 < y,< 1 to deal with edge points. This iseasily generalized to higher dimensions.

2.0.4 Tensors and Forms

For finite-dimensional real vector spaces it is easily shown that the set of allreal-valued linear maps on the space is itself a real vector space, the dual spaceand similarly multilinear real-valued maps form real vector spaces; multilinearreal-valued maps are called tensors. Elementary linear algebra introduces thenotion of a real vector space X and its dual space X∗ of real-valued linearfunctions on X; on manifolds we combine these types of spaces in a smoothway using tensor and exterior products to obtain the necessary compositebundle structures that can support the range of multilinear operations needed

Page 33: Information Geometry: Near Randomness and Near Independence

2 Introduction to Riemannian Geometry 23

for geometry. Exterior differentiation, is the fundamental operation in thecalculus on manifolds and it recovers all of vector calculus in R

3 and extendsit to arbitrary dimensional manifolds.

An m-form is a purely antisymmetric, real-valued, multilinear functionon an argument of m tangent vectors, defined smoothly over the manifold.The space of m-forms becomes a vector bundle ΛmM over M with coordi-nate charts induced from those on M. A 0-form is a real valued function onthe manifold. Thus, the space Λ0M of 0-forms on M consists of sections ofthe trivial bundle M×R. The space Λ1M of 1-forms on M consists of sectionsof the cotangent bundle T ∗M , and ΛkM consists of sections of the antisym-metrized tensor product of k copies of T ∗M . Locally, a 1-form has the localcoordinates of an n-vector, a 2-form has the local coordinates of an antisym-metric n × n matrix. A k-form on an n-manifold has

(

nk

)

independent localcoordinates. It follows that the only k-forms for k > n are the zero k-forms.We summarize some definitions.

There are three fundamental operations on finite-dimensional vector spaces(in addition to taking duals): direct sum ⊕, tensor product ⊗, and exteriorproduct ∧ on a space with itself. Let F,G be two vector spaces, of dimensionsn,m respectively. Take any bases b1, · · · , bn for F, c1, · · · , cm for G, thenwe can obtain bases

b1, · · · , bn, c1, · · · , cm for F ⊕G ,

bi ⊗ cj | i = 1, · · · , n; j = 1, · · · ,m for F ⊗G ,

bi ∧ bj = bi ⊗ bj − bj ⊗ bi | i = 1, · · · , n; i < j for F ∧ F .

So, F ⊕G is essentially the disjoint union of F and G with their zero vectorsidentified. In a formal sense (cf. Dodson and Poston [70], p. 104), F ⊗G canbe viewed as the vector space L(F ∗, G) of linear maps from the dual spaceF ∗ = L(F,R) to G. Recall also the natural equivalence (F ∗)∗ ∼= F . By takingthe antisymmetric part of F ⊗ F we obtain F ∧ F . We deduce immediately:

dimF ⊕G = dimF + dimG ,

dimF ⊗G = dimF · dimG ,

dimF ∧ F =12

dimF (dimF − 1) .

Observe that only for dimF = 3 can we have dimF = dim(F ∧ F ). Actually,this is the reason for the existence of the vector cross product × on R

3 only,giving the uniquely important isomorphism

R3 ∧ R

3 −→ R3 : x ∧ y −→ x× y

and its consequences for geometry and vector calculus on R3.

Each of the operations ⊕,⊗ and ∧ induces corresponding operations onlinear maps between spaces. Indeed, the operations are thoroughly universal

Page 34: Information Geometry: Near Randomness and Near Independence

24 2 Introduction to Riemannian Geometry

and categorical, so they should and do behave well in linear algebraic contexts.Briefly, suppose that we have linear maps f, h ∈ L(F, J) g ∈ (G,K) thenthe induced linear maps in L(F ⊕G, J ⊕K), L(F ⊗G, J ⊗K) and L(F ∧F,J ∧ J) are

f ⊕ g : x⊕ y −→ f(x) ⊕ g(y) ,f ⊗ g : x⊗ y −→ f(x) ⊗ g(y) ,f ∧ h : x ∧ y −→ f(x) ∧ h(y) .

Local coordinates about a point in M induce bases for the tangent vectorspaces and their spaces. The construction of the tangent spaces, directly fromthe choice of the differentiable structure for the manifold, induces a definiterole for tangent vectors. An element v ∈ TxM turns out to be a derivation onsmooth real functions defined near x ∈ M . In a chart about x, v is expressibleas a linear combination of the partial derivations with respect to the chartcoordinates x1, x2, . . . , xn as

v = v1∂1 + v2∂2 + · · · + vn∂n

with ∂i = ∂∂xi , for some vi ∈ R.

This is often abbreviated to v = vi∂i, where summation is to be understoodover repeated upper and lower indices, the summation convention of Einstein.The dual base to ∂i is written dxi and defined by

dxj(∂i) = δji =

1 if i = j ,0 if i = j .

So a 1-form α ∈ T ∗xM is locally expressible as

α = α1dx1 + α2dx

2 + · · · + αndxn = αidx

i

for some αi ∈ R, but a 2-form γ as

γ =∑

i<j

γijdxi ∧ dxj

for some γij ∈ R. The common summation convention here is γ = γ[ij]dxi ∧

dxj . A symmetric 2-tensor would use (ij).Since the ∂i and dxi are well-defined in some chart (U, φ) about x, they

serve also as basis vectors [70] at other points in U . Hence, they act as basisfields for the restrictions of sections of TM → M and T ∗M → M to U ,generating thereby local basis fields for sections of all tensor product bundlesT k

mM → M and exterior product bundles of forms ΛkM → M , restrictedto U . The spaces of bases or frames form a structure called the frame bundleover a manifold, details of its geometry may be found in Cordero, Dodson anddeLeon [43].

Page 35: Information Geometry: Near Randomness and Near Independence

2 Introduction to Riemannian Geometry 25

Given two vector fields u, v on M their commutator or Lie bracket is thenew vector field [u, v] defined as a derivation on real functions f by

[u, v](f) = u(v(f)) − v(u(f)) .

Locally in coordinates using basis fields, for u = ui∂i and v = vj∂j ,

[u, v] = (ui∂ivj − vi∂iu

j)∂j .

The exterior derivative is a linear map on k-forms satisfying

(i) d : ΛkM → Λk+1M (d has degree +1);(ii) df = grad f if f ∈ Λ0M (locally, df = ∂if dx

i);(iii) if α ∈ ΛaM and β ∈ Λ∗M , then

d(α ∧ β) = dα ∧ β + (−1)aα ∧ dβ ;

(iv) d2 = 0.

This d is unique in satisfying these properties.

2.0.5 Riemannian Metric

We recall the importance of inner products on vector spaces—these allow thedefinition of lengths or norms of vectors and angles between vectors. Thecorresponding entity for the tangent vectors to an n-manifold M is a smoothchoice of inner products over its family of vector spaces TxM | x ∈ M. Sucha smooth choice is called a Riemannian metric on M . Formally, a Riemannianmetric g on n-manifold M is a smooth family of maps

g|x : TxM × TxM → R, x ∈ M

that is bilinear, symmetric and positive definite on each tangent space. Thenwe call the pair (M, g) a Riemannian n-manifold. Locally, at each x ∈ M, eachg|x appears in coordinates as a symmetric n × n matrix [gij ] that is positivedefinite, so it has positive determinant. For each v ∈ TxM, the norm of v isdefined to be ||v|| =

g(v, v).We can measure the angle θ between any two vectors u, v in the same

tangent space by means of

cos θ =g(u, v)

g(u, u) g(v, v).

For a smooth curve in (M, g)

c : [0, 1] −→ M : t −→ c(t)

with tangent vector field

c : [0, 1] −→ TM : t −→ c(t)

Page 36: Information Geometry: Near Randomness and Near Independence

26 2 Introduction to Riemannian Geometry

the arc length is the integral of the norm of its tangent vector along the curve:

Lc(t) =∫ 1

0

gc(t)(c(t), c(t)) dt .

The arc length element ds along a curve can be expressed in terms of coordi-nates (xi) by

ds2 =∑

i,j

gij dxi dxj (2.1)

which is commonly abbreviated to

ds2 = gij dxi dxj (2.2)

using the convention to sum over repeated indices.Arc length is often difficult to evaluate analytically because it contains the

square root of the sum of squares of derivatives. Accordingly, we sometimesuse the ‘energy’ of the curve instead of length for comparison between nearbycurves. Energy is given by integrating the square of the norm of c

Ec(a, b) =∫ b

a

||c(t)||2 dt. (2.3)

A diffeomorphism f between Riemannian manifolds (M, g), (N,h) is calledan isometry if its derivative Tf preserves the norms of all tangent vectors:g(v, v) = h(Tf(v), T f(v)). A situation of common interest is when a manifoldcan be isometrically embedded as a submanifold of some Euclidean E

m or ofR

m with some specified metric. Note that if we have a Riemannian manifold(M, g) then an open subset X of M inherits a manifold structure using therestriction of chart maps and the metric g induces a subspace metric g|X so(X, g|X) becomes a Riemannian submanifold of (M, g). For example, the unitsphere S

2 in E3 inherits the subspace metric from the Euclidean metric but

of course S2 has spherical not Euclidean geometry. Evidently, the dimension

of a submanifold will not exceed the dimension of its host manifold.

2.0.6 Connections

In order to compare tangent vectors at different points along a curve in amanifold M we need to have a procedure that transports tangent space vectorsalong the curve, so providing a way to ‘connect up’ unambiguously the tangentspaces passed through. A smooth assignation of tangent vectors along a curveis called a vector field along the curve; one such field is the actual field oftangents to the curve. A suitable connecting entity in the limiting case at apoint defines a derivative of a vector field with respect to the tangent to thecurve, and gives the result as another tangent vector at the same point. Now,every tangent vector u ∈ TxM can be realised as the tangent vector to a curve

Page 37: Information Geometry: Near Randomness and Near Independence

2 Introduction to Riemannian Geometry 27

through x and therefore we finish up with a smooth family of bilinear maps∇ = ∇|x, x ∈ M with the property

∇|x : Tx × Tx → Tx : (u, v) → ∇uv, defined over x ∈ M. (2.4)

In coordinates, we have a basis of TxM given by the derivations (∂i) andso for some real components (ui), (vj), using the summation convention forrepeated indices and (∂i) as basis vector fields u = ui∂i, v = vj∂j and then

∇uv = (ui∂ivj + ukvmΓ j

km)∂j (2.5)

for a smooth n×n×n array of functions Γ jkm called the Christoffel symbols. It

turns out that ∇ provides a derivative for vector valued maps on the manifold,that is of vector field maps v : M → TM, and returns the answer as anothervector field; this derivation operator is called the covariant derivative. Thesmooth family of bilinear maps (2.4) is called a linear connection and thereare many ways to formalise its definition [70]. The important theorem here isthat for a given Riemannian manifold there is a unique linear connectionthat preserves the metric and has symmetric Christoffel symbols, this is theLevi-Civita or symmetric metric connection.

Now, we have seen above §2.0.3 that the derivative of a smooth map be-tween manifolds f : M → N gives a corresponding map Tf : TM → TN.Also, a vector field v on M, is a section v : M → TM of the tangent bundleprojection π : TM → M ; this means that π v is the identity map on M.Therefore the derivative of the vector field will not be another vector field buta map Tv : TM → TTM. This is why we need the connection, it provides aprojection of a derivative Tv back onto the the tangent bundle; the covariantderivative of a vector field is precisely the projection of a derivative.

Formally, a linear connection ∇ gives a smooth bundle splitting at eachu ∈ TTM of the space TuTM into a direct sum

TuTM ∼= HuTM ⊕ VuTM

where VuTM = ker(Tπ : TuTM → Tπ(u)M). We call HuTM the horizontalsubspace (of TTM) at u ∈ TM and VuTM the vertical subspace at u ∈ TM .They comprise the horizontal and vertical subbundles, respectively, of TTM .

TTM = HTM ⊕ V TM.

For our purposes, the important role of a connection is that it induces isomor-phisms called horizontal lifts from tangent spaces on the base M to horizontalsubspaces of the tangent spaces to TM :

↑ : Tπ(u)M −→ HuTM ⊂ TuTM : v −→ v↑.

Technically, a connection splits the exact sequence of vector bundles

0 −→ V TM −→ TTM −→ TM −→ 0

Page 38: Information Geometry: Near Randomness and Near Independence

28 2 Introduction to Riemannian Geometry

by providing a bundle morphism TM → TTM with image the bundle ofhorizontal subspaces.

Along any curve c : [0, 1) → M in M we can construct through eachu0 ∈ π−1(c(0)) ⊂ TM a unique curve c↑ : [0, 1) −→ TM with horizontaltangent vector and π c↑ = c, c↑(0) = u0. The map

τt : π−1(c(0)) −→ π−1(c(t)) : u0 −→ c↑(t)

defined by the curve is called parallel transport along c. Parallel transportis always a linear isomorphism. An associated parallel transport map satisfiesτt v(c(t)) = v(c(t)). The covariant derivative of v along c is defined to bethe limit, if it exists

limh→0

1h

(

τ−1h v(c(t + h)) − v(c(t))

)

and is usually denoted by ∇c(t)v. Using integral curves c, this extends easilyto ∇wv for any vector field w. Evidently, the operator ∇ is linear and aderivation:

∇w(u + v) = ∇wu + ∇wv and ∇w(fv) = w(f)v + f∇wv ;

it measures the departure from parallelism. The local appearance of ∇ onbasis fields (∂i) about x ∈ M is

∇∂i∂j = Γ k

ij ∂k

where the Γ kij are the Christoffel symbols defined earlier.

For a linear connection we define two important tensor fields in terms oftheir action on tangent vector fields: the torsion tensor field T is defined by

T (u, v) = ∇uv −∇vu− [u, v]

and the curvature tensor field is the section of T 13M defined by

R(u, v)w = ∇u∇vw −∇v∇uw −∇[u,v]w .

The connection is called torsion-free or symmetric when T = 0 and flat whenR = 0.

In local coordinates with respect to base fields (∂i),

T (∂j , ∂k) = (Γ ijk − Γ i

kj)∂i ,

R(∂k, ∂l)∂j = (∂kΓilj − ∂lΓ

ikj + Γh

ljΓikh − Γh

kjΓilh)∂i .

The connection form ω is an Rn2

-valued linear function on vector fields andis expressible as a matrix valued 1-form with components

ωij = Γ i

jk dxk . (2.6)

Page 39: Information Geometry: Near Randomness and Near Independence

2.2 Universal Connections and Curvature 29

Hence

dωij = d(Γ i

jk) ∧ dxk

= ∂r Γijk dx

r ∧ dxk

ωih ∧ ωh

j = Γ ihr Γ

hjk dx

r ∧ dxk

The curvature form Ω is an Rn2

-valued antisymmetric bilinear function onpairs of vector fields and it has the local expression

Ωij =

12Ri

jrk dxr ∧ dxk

= Rijrk dx

r ∧ dxk .

2.1 Autoparallel and Geodesic Curves

A curve c : [0, 1) → M that has a parallel tangent vector field c = cj∂j

satisfies:∇c(t)c(t) = 0 (2.7)

which in coordinate components from (2.5) becomes

ci + Γ ijk c

j ck = 0 for each i.

It is then called an autoparallel curve . In the case that the connection ∇ is theLevi-Civita connection of a Riemannian manifold (M, g), all the parallel trans-port maps are actually isometries and then the autoparallel curves c satisfying(2.7) are called geodesic curves (cf. [70] for more discussion of geodesic curves).Geodesic curves have extremal properties—between close enough points theyprovide uniquely shortest length curves. For example, in Euclidean E

3 thegeodesics are straight lines and so provide shortest distances between points;on the standard unit sphere S

2 ⊂ E3 the geodesics are arcs of great circles

and so between pairs of points the two arcs provide maximal and minimalgeodesic distances.

2.2 Universal Connections and Curvature

A connection, §2.0.6 encodes geometrical choices, and through its curvature,underlying topological information. In some situations, both in geometry andin theoretical physics, it is necessary to consider a family of connections, forexample with regard to stability of certain properties [36]. Also, it is commonfor statisticians to consider a number of linear connections on a given statisti-cal manifold and so it can be important to be able to handle these connectionsas a geometrical family of some kind.

Page 40: Information Geometry: Near Randomness and Near Independence

30 2 Introduction to Riemannian Geometry

In general, the space of linear connections on a manifold is infinite di-mensional, but Mangiarotti and Modugno [140, 152] introduced the idea of asystem (or structure) of connections which gives a representation of the spaceof linear connections as a finite dimensional bundle. On this system there is a‘universal’ connection and corresponding ‘universal’ curvature; then all linearconnections and their curvatures are pullbacks of these universal objects.

A full account of the underlying geometry of jet bundles and their mor-phisms is beyond our present scope so we refer the interested reader toMangiarotti and Modugno [140, 152]. Dodson and Modugno [69] provided auniversal calculus for this context. An application of universal linear connec-tions to a stability problem in spacetime geometry was given by Canarutto andDodson [36] and further properties of the system of linear connections weregiven by Del Riego and Dodson [53]. An explicit set of geometrical exam-ples with interesting topological properties was provided by Cordero, Dodsonand Parker [44]. The first application to information geometry was given byDodson [59] for the system of α-connections.

The technical details would take us too far from our present theme butour recent results on statistical manifolds are given in Arwini, Del Riego andDodson [16]. There we describe the system of all linear connections on themanifold of exponential families, using the tangent bundle, §2.0.3, to give thesystem space. We provide formulae for the universal connections and curva-tures and give an explicit example for the manifold of gamma distributions,§3.5. It seems likely that there could be significant developments from theresults on universal connections for exponential families §3.2, for example inthe context of group actions on random variables.

Page 41: Information Geometry: Near Randomness and Near Independence

3

Information Geometry

We use the term information geometry to cover those topics concerningthe use of the Fisher information matrix to define a Riemannian metric,§ 2.0.5, on smooth spaces of parametric statistical models, that is, on smoothspaces of probability density functions. Amari [8, 9], Amari and Nagaoka [11],Barndorff-Nielsen and Cox [20], Kass and Vos [113] and Murray and Rice [153]provide modern accounts of the differential geometry that arises from theFisher information metric and its relation to asymptotic inference. The In-troduction by R.E. Kass in [9] provided a good summary of the backgroundand role of information geometry in mathematical statistics. In the presentmonograph, we use Riemannian geometric properties of various families ofprobability density functions in order to obtain representations of practicalsituations that involve statistical models.

It has by many experts been argued that the information geometric ap-proach may not add significantly to the understanding of the theory of para-metric statistical models, and this we acknowledge. Nevertheless, we are ofthe opinion that there is benefit for those involved with practical modelling ifessential qualitative features that are common across a wide range of applica-tions can be presented in a way that allows geometrical tools to measure dis-tances between and lengths along trajectories through perturbations of modelsof relevance. Historically, the richness of operations and structure in geometryhas had a powerful influence on physics and those applications suggested newgeometrical developments or methodologies; indeed, from molecular biologysome years ago, the behaviour of certain enzymes in DNA manipulation led tothe identification of useful geometrical operators. What we offer here is someelementary geometry to display the features common, and of most significance,to a wide range of typical statistical models for real processes. Many more geo-metrical tools are available to make further sophisticated studies, and we hopethat these may attract the interest of those who model. For example, it wouldbe interesting to explore the details of the role of curvature in a variety of ap-plications, and to identify when the distinguished curves called geodesics, soimportant in fundamental physics, have particular significance in various real

K. Arwini, C.T.J. Dodson, Information Geometry. 31Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 42: Information Geometry: Near Randomness and Near Independence

32 3 Information Geometry

processes with essentially statistical features. Are there useful ways to com-pactify some parameter spaces of certain applications to benefit thereby fromalgebraic operations on the information geometry? Do universal connectionson our information geometric spaces have a useful role in applications?

3.1 Fisher Information Metric

First we set up a smooth n-manifold, § 2.0.2, with a chart the n-dimensionalparameter space of a smooth family of probability density functions, § 1.2.Let Θ be the parameter space of a parametric statistical model, that is ann-dimensional smooth family pθ|θ ∈ Θ of probability density functions de-fined on some fixed event space Ω of unit measure,

Ω

pθ = 1 for all θ ∈ Θ.

Let Θ be the parameter space of an n-dimensional smooth such familydefined on some fixed event space Ω

pθ|θ ∈ Θ with∫

Ω

pθ = 1 for all θ ∈ Θ.

Then, the derivatives of the log-likelihood function, l = log pθ, yield a matrixand the expectation of its entries is

gij =∫

Ω

(

∂l

∂θi

∂l

∂θj

)

= −∫

Ω

(

∂2l

∂θi∂θj

)

, (3.1)

for coordinates (θi) about θ ∈ Θ ⊆ Rn.

This gives rise to a positive definite matrix, so inducing a Riemannianmetric g, from §2.0.5, on Θ using for coordinates the parameters (θi). From theconstruction of (3.1), a smooth invertible transformation of random variables,that is of the labelling of the points in the event space Ω while keeping thesame parameters (θi), will leave the Riemannian metric unaltered. Formally,it induces a smooth diffeomorphism of manifolds that preserves the metric,namely it induces a Riemannian isometry; in this situation the diffeomorphismis simply the identity map on parameters. We shall see this explicitly belowfor the case of the log-gamma distribution §3.6 and its associated Riemannianmanifold.

The elements in the matrix (3.1) give the arc length function §2.1

ds2 =∑

i,j

gij dθi dθj (3.2)

which is commonly abbreviated to

ds2 = gij dθi dθj (3.3)

using the convention to sum over repeated indices.

Page 43: Information Geometry: Near Randomness and Near Independence

3.2 Exponential Family of Probability Density Functions 33

The metric (3.1) is called the expected information metric or Fisher metricfor the manifold obtained from the family of probability density functions; theoriginal ideas are due to Fisher [86] and C.R. Rao [171]. Of course, the secondequality in equation (3.1) depends on certain regularity conditions [188] butwhen it holds it can be particularly convenient to use. Amari [8, 9] and Amariand Nagaoka [11], Barndorff-Nielsen and Cox [20], Kass and Vos [113] andMurray and Rice [153] provide modern accounts of the differential geometrythat arises from the information metric.

3.2 Exponential Family of Probability Density Functions

An n-dimensional parametric statistical model Θ ≡ pθ|θ ∈ Θ is said to bean exponential family or of exponential type, when the density function canbe expressed in terms of functions C,F1, ..., Fn on Λ and a function ϕ onΘ as:

p(x; θ) = eC(x)+∑

i θi Fi(x)−ϕ(θ) , (3.4)

then we say that (θi) are its natural or canonical parameters, and ϕ is the po-tential function. From the normalization condition

p(x; θ) dx = 1 we obtain:

ψ(θ) = log∫

eC(x)+∑

i θi Fi(x) dx . (3.5)

This potential function is therefore a distinguished function of the coordi-nates alone and in the sequel, § 3.4, we make use of it for the presentationof the manifold as an immersion in R

n+1. From the normalization condition∫

Ωpθ(x) dx = 1 we obtain:

ϕ(θ) = log∫

Ω

eC(x)+∑

i(θi Fi(x)) dx . (3.6)

With ∂i = ∂∂θi , we use the log-likelihood function, § 1.4, l(θ, x) = log(pθ(x))

to obtain

∂il(θ, x) = Fi(x) − ∂iϕ(θ)

and

∂i∂j l(θ, x) = −∂i∂jϕ(θ) .

The information metric g on the n-dimensional space of parameters Θ ⊂ Rn,

equivalently on the set S = pθ|θ ∈ Θ ⊂ Rn, has coordinates:

[gij ] = −∫

Ω

[∂i∂j l(θ, x)] pθ(x) dx = ∂i∂jϕ(θ) = ϕij(θ) . (3.7)

Page 44: Information Geometry: Near Randomness and Near Independence

34 3 Information Geometry

Then, (S, g) is a Riemannian n-manifold, § 2.0.5, with Levi-Civita connection,§ 2.0.6, given by:

Γ kij(θ) =

n∑

h=1

12gkh (∂igjh + ∂jgih − ∂hgij)

=n∑

h=1

12gkh ∂i∂j∂hϕ(θ) =

n∑

h=1

12ϕkh(θ)ϕijh(θ)

where [ϕhk(θ)] represents the inverse to [ϕhk(x)].

3.3 Statistical a-Connections

There is a family of symmetric linear connections, § 2.0.6, which includesthe Levi-Civita case, § 2.0.6, and it has certain uniqueness properties andsignificance in mathematical statistics. See Amari [9] and Lauritzen [134] formore details and properties than we have space for here. In § 2.2 we discusshow families of linear connections have certain universal properties.

Consider for α ∈ R the function Γ(α)ij,k which maps each point θ ∈ Θ to the

following value:

Γ(α)ij,k(θ) =

Ω

(

∂i∂j l +1 − α

2∂il ∂j l

)

∂kl pθ

=1 − α

2∂i∂j∂kϕ(θ) =

1 − α

2ϕijk(θ) . (3.8)

So we have an affine connection ∇(α) on the statistical manifold (S, g) de-fined by

g(∇(α)∂i

∂j , ∂k) = Γ(α)ij,k ,

where g is the Fisher information metric, §3.1. We call this ∇(α) theα-connection and it is clearly a symmetric connection, §2.0.6 and definesan α-curvature. We have also

∇(α) = (1 − α) ∇(0) + α∇(1) ,

=1 + α

2∇(1) +

1 − α

2∇(−1) .

For a submanifold M ⊂ S, §2.0.5, the α-connection on M is simply the restric-tion with respect to g of the α-connection on S. Note that the 0-connection isthe Riemannian or Levi-Civita, §2.0.6 connection with respect to the metricand its uniqueness implies that an α-connection is a metric connection if andonly if α = 0.

Page 45: Information Geometry: Near Randomness and Near Independence

3.4 Affine Immersions 35

Proposition 3.1. The 0-connection is the Riemannian connection, metricconnection or Levi-Civita connection with respect to the Fisher metric.

In general, when α = 0, ∇(α) is not metric.The notion of exponential family, §3.2 has a close relation to ∇(1). From

the definition of an exponential family given in Equation (3.4), with ∂i = ∂∂θi

,we obtain

∂i(x; θ) = Fi(x) − ∂iϕ(θ) (3.9)

and

∂i∂j(x; θ) = −∂i∂jϕ(θ) . (3.10)

where (x; θ) = log f(x; θ).Hence we have Γ

(1)ij,k = −∂i∂jϕEθ[∂kθ], which is 0. In other words, we see

that (θi) is a 1-affine coordinate system, and Θ is 1-flat.In particular, the 1-connection is said to be an exponential connection,

and the (−1)-connection is said to be a mixture connection. We say that anα-connection and the (−α)-connection are mutually dual with respect to themetric g since the following formula holds:

Xg(Y,Z) = g(∇(α)X Y,Z) + g(Y,∇(−α)

X Z),

where X,Y and Z are arbitrary vector fields on M .Now, Θ is an exponential family, so a mixture coordinate system is given

by a potential function, §3.2, that is,

ηi =∂ϕ

∂θi. (3.11)

Since (θi) is a 1-affine coordinate system, (ηi) is a (−1)-affine coordinatesystem, and they are mutually dual with respect to the metric. Therefore thestatistical manifold has dually orthogonal foliations (Section 3.7 in [11]).

The coordinates in (ηi) admit a potential function given by:

λ = θi ηi − ϕ(θ). (3.12)

3.4 Affine Immersions

Let M be an m-dimensional manifold, f an immersion from M to Rm+1, and

ξ a vector field along f . We can ∀x ∈ Rm+1, identify TxRm+1 ≡ R

m+1. Thepair f, ξ is said to be an affine immersion from M to R

m+1 if, for each pointp ∈ M , the following formula holds:

Tf(p)Rm+1 = f∗(TpM) ⊕ Spanξp .

Page 46: Information Geometry: Near Randomness and Near Independence

36 3 Information Geometry

We call ξ a transversal vector field and it is a technical requirement to ensurethat the differential structure is preserved into the immersion.

We denote by D the standard flat affine connection of Rm+1. Identifying

the covariant derivative along f with D, we have the following decompositions:

DXf∗Y = f∗(∇XY ) + h(X,Y )ξ,DXξ = −f∗(Sh(X)) + µ(X)ξ.

The induced objects ∇, h, Sh and µ are the induced connection, the affine fun-damental form, the affine shape operator and the transversal connection form,respectively. If the affine fundamental form h is positive definite everywhereon M , the immersion f is said to be strictly convex. And if µ = 0, the affineimmersion f, ξ is said to be equiaffine. It is known that a strictly convexequiaffine immersion induces a statistical manifold. Conversely, the conditionwhen a statistical manifold can be realized in an affine space has been studied.We say that an affine immersion f, ξ : Θ → R

m+1 is a graph immersion ifthe hypersurface is a graph of ϕ in R

m+1:

f : M → Rm+1 :

θ1..θm

θ1..θm

ϕ(θ)

, ξ =

0001

,

Set ∂i = ∂∂θi

, ϕij = ∂2ϕ∂θi ∂θj

. Then we have

D∂if∗∂j = ϕij ξ.

This implies that the induced connection ∇ is flat and (θi) is a ∇-affinecoordinate system.

Proposition 3.2. Let (M,h,∇,∇∗) be a simply connected dually flat spacewith a global coordinate system and (θ) an affine coordinate system of ∇.Suppose that ϕ is a θ-potential function. Then (M,h,∇) can be realized inR

m+1 by a graph immersion whose potential is ϕ.

3.4.1 Weibull Distributions: Not of Exponential Type

We shall see that gamma density functions form an exponential family; how-ever, some distributions do not and one such example is given by the Weibullfamily for nonnegative random variable x:

w(x;κ, τ) = κτ(κx)τ−1 e(κx)τ−1κ, τ > 0. (3.13)

Like the gamma family, the Weibull family (3.13) contains the exponential dis-tribution as a special case and has wide application in models for reliability

Page 47: Information Geometry: Near Randomness and Near Independence

3.5 Gamma 2-Manifold G 37

and lifetime statistics, but the lack of a natural affine immersion to present per-turbations of the exponential distribution makes it unsuitable for our presentpurposes. We provide elsewhere [17] the α−connection and α−curvature forthe Weibull family and illustrate the geometry of its information metric withexamples of geodesics.

3.5 Gamma 2-Manifold GThe family of gamma density functions has event space Ω = R

+ and proba-bility density functions given by

f(x; γ, κ)|γ, κ ∈ R+

so here M ≡ R+ × R

+ and the random variable is x ∈ Ω = R+ with

f(x; γ, κ) =(

κ

γ

)κxκ−1

Γ (κ)e−xκ/γ . (3.14)

Proposition 3.3. Denote by G the gamma manifold based on the family ofgamma density functions. Set ν = κ

γ . Then the probability density functionshave the form

p(x; ν, κ) = νκ xκ−1 e−xν

Γ (κ). (3.15)

In this case (ν, κ) is a natural coordinate system of the 1-connection and

ϕ(θ) = logΓ (κ) − κ log ν (3.16)

is the corresponding potential function, §3.2.

Proof. Using ν = κγ , the logarithm of gamma density functions can be writ-

ten as

log p(x; ν, κ) = log(

νκ xκ−1

Γ (κ)e−ν x

)

= − log x + (κ log x− ν x) − (logΓ (κ) − κ log ν) (3.17)

Hence the set of all gamma density functions is an exponential family, §3.2.The coordinates (θ1, θ2) = (ν, κ) is a natural coordinate system, §3.3, and

ϕ(ν, κ) = logΓ (κ) − κ log ν

is its potential function.

Corollary 3.4. Since ϕ(θ) is a potential function, the Fisher metric is givenby the Hessian of ϕ, that is, with respect to natural coordinates:

[gij ] (ν, κ) =[

∂2ϕ(θ)∂θi∂θj

]

=[

κν2 − 1

ν− 1

ν ψ′′(κ)

]

=[ κ

ν2 − 1ν

− 1ν

d2

dκ2 log(Γ )

]

. (3.18)

Page 48: Information Geometry: Near Randomness and Near Independence

38 3 Information Geometry

In terms of the original coordinates (γ, κ) in equation (3.14), the gammadensity functions are

f(x; γ, κ) =(

κ

γ

)κxκ−1

Γ (κ)e−xκ/γ

and then the metric components matrix takes a convenient diagonal form

[gij ] (γ, κ) =

[

κγ2 00 d2

dκ2 log(Γ ) − 1κ

]

. (3.19)

So the pair (γ, κ) yields an orthogonal basis of tangent vectors, which is usefulin calculations because then the arc length function §2.1 is simply

ds2 =κ

γ2dγ2 +

(

(

Γ ′(κ)Γ (κ)

)′− 1

κ

)

dκ2.

This orthogonality property of (γ, κ) coordinates is equivalent to asymptoticindependence of the maximum likelihood estimates, cf. Barndorff-Nielsen andCox [20], Kass and Vos [113] and Murray and Rice [153] [20, 113, 153].

3.5.1 Gamma a-Connection

For each α ∈ R, the α (or ∇(α))-connection is the torsion-free affine connectionwith components:

Γ(α)ij,k =

1 − α

2∂i ∂j ∂kϕ(θ) ,

where ϕ(θ) is the potential function, and ∂i = ∂∂θi

.Since the set of gamma density functions is an exponential family, §3.2,

and the connection ∇(1) is flat. In this case, (ν, κ) is a 1-affine coordinatesystem.

So the 1 and (−1)-connections on the gamma manifold are flat.

Proposition 3.5. The functions Γ(α)ij,k are given by

Γ(α)11,1 = − (1 − α) κ

ν3,

Γ(α)12,1 = Γ

(α)12,2 =

1 − α

2 ν2,

Γ(α)22,2 =

(1 − α) ψ′′(κ)2

(3.20)

while the other independent components are zero.

Page 49: Information Geometry: Near Randomness and Near Independence

3.5 Gamma 2-Manifold G 39

We have an affine connection ∇(α) defined by

〈∇(α)∂i

∂j , ∂k〉 = Γ(α)ij,k ,

So by solving the equations

Γ(α)ij,k =

2∑

h=1

gkh Γh(α)ij , (k = 1, 2).

we obtain the components of ∇(α):

Proposition 3.6. The components Γ i(α)jk of the ∇(α)-connection are given by

Γ(α)111 =

(α− 1) (−1 + 2κψ′(1, κ))2 ν (−1 + κψ′(κ))

,

Γ(α)112 = − (α− 1) ψ′(1, κ)

−2 + 2κψ′(κ),

Γ(α)122 = − (α− 1) ν ψ′′(κ)

−2 + 2κψ′(κ),

Γ(α)211 =

(α− 1) κ2 ν2 (−1 + κψ′(κ))

,

Γ(α)212 =

1 − α

−2 ν + 2 ν κψ′(κ),

Γ(α)222 = − (α− 1) κψ′′(κ)

−2 + 2κψ′(κ). (3.21)

while the other independent components are zero.

3.5.2 Gamma a-Curvatures

Proposition 3.7. Direct calculation gives the α-curvature tensor of G

Rα1212 =

(

α2 − 1)

(ψ′(κ) + κψ′′(κ))4 ν2 (−1 + κψ′(κ))

, (3.22)

while the other independent components are zero.By contraction we obtain:α-Ricci tensor:

[R(α)ij ] =

(

α2 − 1)

−κ (ψ′(κ)+κ ψ′′(κ))4 ν2 (−1+κ ψ′(κ))2

(ψ′(κ)+κ ψ′′(κ))4 ν (−1+κ ψ′(κ))2

(ψ′(κ)+κ ψ′′(κ))4 ν (−1+κ ψ′(κ))2

−ψ′(κ) (ψ′(κ)+κ ψ′′(κ))4 (−1+κ ψ′(κ))2

⎦ (3.23)

Additionally, the eigenvalues and the eigenvectors for the α-Ricci tensor aregiven by

Page 50: Information Geometry: Near Randomness and Near Independence

40 3 Information Geometry

0.0 0.5 1.0 1.5 2.0 2.5 3.0

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

Scalar curvature R(0)(κ)

κ

Fig. 3.1. Scalar curvature for α = 0 from equation (3.25) for the gamma manifold.The regime κ < 1 corresponds to clustering of events and κ > 1 corresponds tosmoothing out of events, relative to an underlying Poisson process which correspondsto κ = 1.

(

1 − α2)

(

κ+ν2 ψ′(κ)+√

4 ν2+κ2−2 ν2 κ ψ′(κ)+ν4 ψ′(κ)2)

(ψ′(κ)+κ ψ′′(κ))8 ν2 (−1+κ ψ′(κ))2

(

κ+ν2 ψ′(κ)−√

4 ν2+κ2−2 ν2 κ ψ′(κ)+ν4 ψ′(κ)2)

(ψ′(κ)+κ ψ′′(κ))8 ν2 (−1+κ ψ′(κ))2

−(

κ−ν2 ψ′(κ)+√

4 ν2+κ2−2 ν2 κ ψ′(κ)+ν4 ψ′(κ)2)

2 ν 1−κ+ν2 ψ′(κ)+

√4 ν2+κ2−2 ν2 κ ψ′(κ)+ν4 ψ′(κ)2

2 ν 1

⎠ (3.24)

α-Scalar curvature:

R(α) =

(

1 − α2)

(ψ′(κ) + κψ′′(κ))

2 (−1 + κψ′(κ))2. (3.25)

This is shown in Figure 3.1 for the Levi-Civita case α = 0. We note that

R(α) → − (1−α2)2 as κ → 0.

3.5.3 Gamma Manifold Geodesics

The Fisher information metric for the gamma manifold is given in (γ, κ) co-ordinates by the arc length function §2.0.5 §3.2

Page 51: Information Geometry: Near Randomness and Near Independence

3.5 Gamma 2-Manifold G 41

ds2 =κ

γ2dγ2 +

(

(

Γ ′(κ)Γ (κ)

)′− 1

κ

)

dκ2.

The Levi-Civita connection ∇ is that given by setting α = 0 in theα-connections of the previous section. Geodesics for this case are curvessatisfying

∇cc = 0.

Background details can be found for example in [70]. In coordinate compo-nents c = cj∂j and we obtain from (2.5)

ci + Γ ijk c

j ck = 0 for each i

and in our case with coordinates (γ, κ) = (x, y) we have the nonlinear simul-taneous equations

x =x2

x− x y

y(3.26)

y =y x2

2x2 (y ψ′(y) − 1)− (ψ′′(y) y2 + 1) y2

2y (y ψ′(y) − 1)(3.27)

with ψ′(y) =(

Γ ′(y)Γ (y)

)′.

This system is difficult to solve analytically but we can find numerical solutionsusing the Mathematica programs of Gray [99]. Figure 3.2 shows a spray of somemaximally extended geodesics emanating from the point (γ, κ) = (1, 1).

0.5 1.0 1.5 2.0 2.5 3.0

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

κ

γ

Fig. 3.2. Examples of maximally extended geodesics passing through (γ, κ) = (1, 1)in the gamma manifold.

Page 52: Information Geometry: Near Randomness and Near Independence

42 3 Information Geometry

3.5.4 Mutually Dual Foliations

Now, G represents an exponential family of density functions, so a mixturecoordinate system is given by a potential function. Since (ν, κ) is a 1-affinecoordinate system, (η1, η2) given by

η1 =∂ϕ

∂ν= −κ

ν,

η2 =∂ϕ

∂κ= ψ(κ) − log ν. (3.28)

is a (−1)-affine coordinate system, and they are mutually dual with respectto the metric. Therefore the gamma manifold has dually orthogonal foliationsand potential function, §3.2,

λ = −κ + ψ(κ) − logΓ (κ). (3.29)

3.5.5 Gamma Affine Immersion

The gamma manifold has an affine immersion in R3, §3.4

Proposition 3.8. Let G be the gamma manifold with the Fisher metric g andthe exponential connection ∇(1). Denote by (ν, κ) a natural coordinate system.Then G can be realized in R

3 by the graph of a potential function

f : G → R3

(

νκ

)

νκ

logΓ (κ) − κ log ν

⎠ , ξ =

001

⎠ .

The submanifold, §2.0.5, of exponential density functions is represented by thecurve

(0,∞) → R3 : ν → ν, 1, log

and a tubular neighbourhood of this curve will contain all immersions for smallenough perturbations of exponential density functions.

3.6 Log-Gamma 2-Manifold L

The log-gamma family of probability density functions for random variableN ∈ (0, 1] is given by

q(N ; ν, τ) =ντ Nν−1

(

log 1N

)τ−1

Γ (τ)for ν > 0 and τ > 0 . (3.30)

Some of these density functions with central mean N = 12 are shown in

Figure 3.3.

Page 53: Information Geometry: Near Randomness and Near Independence

3.6 Log-Gamma 2-Manifold L 43

0.20.4

0.60.8

5

10

15

20

0

2

4

0.20.4

0.60.8

τ

N

Fig. 3.3. The log-gamma family of probability densities (3.30) with central meanN = 1

2as a surface. The surface tends to the delta function as τ → ∞ and coincides

with the constant 1 at τ = 1.

Proposition 3.9. The log-gamma family (3.30) with information metric de-termines a Riemannian 2-manifold L with the following properties• it contains the uniform distribution• it contains approximations to truncated Gaussian density functions• it is an isometric isomorph of the manifold G of gamma density functions.

Proof. By integration, it is easily checked that the family given by equation(3.30) consists of probability density functions for the random variable N ∈(0, 1]. The limiting densities are given by

limτ→1+

q(N, ν, τ) = q(N, ν, 1) =1ν

(

1N

)1− 1ν

(3.31)

limκ→1

q(N, ν, 1) = q(N, 1, 1) = 1. (3.32)

The mean, N , standard deviation σN , and coefficient of variation, §1.2,cvN , of N are given by

N =(

ν

1 + ν

(3.33)

σN =

(

ν

ν + 2

−(

ν

1 + ν

)2τ

(3.34)

cvN =σN

N=

(1 + ν)2τ

ντ (2 + ν)τ− 1. (3.35)

Page 54: Information Geometry: Near Randomness and Near Independence

44 3 Information Geometry

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5q(N; ν,τ)

τ =5

τ =2

τ =1

τ = 12

N

Fig. 3.4. Log-gamma probability density functions q(N ; ν, τ), from (3.30)for N ∈(0, 1], with central mean N = 1

2, and τ = 1

2, 1, 2, 5. The case τ = 1 corresponds

to a uniform distribution, τ < 1 corresponds to clustering and conversely, τ > 1corresponds to dispersion.

We can obtain the family of densities having central mean in (0, 1], bysolving N = 1

2 , which corresponds to the locus (21/τ − 1)ν = 1; some ofthese are shown in Figure 3.3 and Figure 3.4. Evidently, the density functionswith central mean and large τ provide approximations to Gaussian densityfunctions truncated on (0, 1].

For the log-gamma densities, the Fisher information metric on the para-meter space Θ = (ν, τ) ∈ (0,∞) × (0,∞) is given by

[gij ] (ν, τ) =[ τ

ν2 − 1ν

− 1ν

d2

dτ2 log(Γ )

]

(3.36)

In fact, (3.30) arises for the non-negative random variable x = log 1N from the

gamma family equation (3.14) above with change of parameters to ν = κγ and

τ = κ, namely

f(x, ν, τ) =xτ−1 ντ

Γ (τ)e−xν . (3.37)

It is known that the gamma family (3.37) has also the information metric(3.36) so the identity map on the space of coordinates (ν, τ) is an isometry ofRiemannian manifolds L and G.

In terms of the coordinates (γ = τ/ν, τ) the log-gamma densities (3.30)become

q(N ;τ

γ, τ) =

1Γ (τ)

(

τ

γ

Nτγ −1

(

log1N

)τ−1

for γ > 0 and τ > 0 . (3.38)

Page 55: Information Geometry: Near Randomness and Near Independence

3.7 Gaussian 2-Manifold 45

Then the metric components form a diagonal matrix

[gij ] (γ, τ) =

[

τγ2 00 d2

dτ2 log(Γ ) − 1τ

]

. (3.39)

Hence the coordinates (γ, τ) yield orthogonal tangent vectors, which can bevery convenient in applications.

3.6.1 Log-Gamma Random Walks

We can illustrate the sensitivity of the parameter τ in the vicinity of theuniform distribution at τ = 1 by constructing random walks using unit stepsin the plane with log-gamma distributed directions. This is done by wrappingthe log-gamma random variable N ∈ (0, 1] from Equation 3.30 around theunit circle S

1 to give a random angle θ ∈ (0, 2π]. Then, starting at the originr(0) = (0, 0) in the x, y plane we take a random sequence Ni ∈ (0, 1]i=1,n

drawn from the log-gamma distribution Equation 3.30 to generate a sequenceof angles θi = 2πNi ∈ (0, 2π]i=1,n and hence define the random walk

r : 1, 2, . . . , n → R2 : k →

k∑

i=1

(cos θi, sin θi). (3.40)

Figure 3.5 shows typical such random walks with 10,000 steps for the casesfrom Equation 3.30 with ν = 1 and τ = 0.9, 1, 1.1. So, the τ = 1 case is astandard or isotropic random walk with uniformly distributed directions foreach successive unit step.

3.7 Gaussian 2-Manifold

The family of univariate normal or Gaussian density functions has event spaceΩ = R and probability density functions given by

N ≡ N(µ, σ2) = n(x;µ, σ)|µ ∈ R, σ ∈ R+ (3.41)

with mean µ and variance σ2. So here N = R × R+ is the upper half -plane,

and the random variable is x ∈ Ω = R with

n(x;µ, σ) =1√

2π σe−

(x−µ)2

2 σ2 (3.42)

The mean µ and standard deviation σ are frequently used as a local coordinatesystem ξ = (ξ1, ξ2) = (µ, σ). Lauritzen [134] gave a detailed discussion ofthe information geometry of the manifold N of Gaussian density functions,including its geodesic curves.

Page 56: Information Geometry: Near Randomness and Near Independence

46 3 Information Geometry

0 50 100 150 200 250 3000

50

100

150

200

τ = 1.1

−40 −20 0 20 40 60 80

0

20

40

60

80

100

τ = 1 Standard random walk

−200 −150 −100 −50 0

−350

−300

−250

−200

−150

−100

−50

0

τ = 0.9

Fig. 3.5. Log-gamma random walks from Equation 3.40. Each has 10,000 unitlength steps starting from the origin for the cases in Equation 3.30 with ν = 1 forτ = 0.9, 1, 1.1. The central graph shows the standard random walk with uniformlydistributed directions in the plane.

Page 57: Information Geometry: Near Randomness and Near Independence

3.7 Gaussian 2-Manifold 47

Shannon’s information theoretic entropy is given by:

SN (µ, σ) = −∫ ∞

−∞log(n(t;µ, σ))n(t;µ, σ) dt =

12

(1 + log(2π)) + log(σ)

(3.43)At unit variance the entropy is SN = 1

2 (1 + log(2π)) .

3.7.1 Gaussian Natural Coordinates

Proposition 3.10. In the manifold of Gaussian or normal densities N, setθ1 = µ

σ2 and θ2 = − 12 σ2 . Then ( µ

σ2 ,− 12 σ2 ) is a natural coordinate system and

ϕ = − θ12

4 θ2+

12

log(− π

θ2) =

µ2

2σ2+ log(

√2π σ) (3.44)

is the corresponding potential function, §3.2.

Proof. Set θ1 = µσ2 and θ2 = − 1

2 σ2 . Then the logarithm of the univariateGaussian density can be written as

log n(x; θ1, θ2) = log e(µ

σ2 )x+( −12 σ2 )x2−

(

µ2

2 σ2 +log(√

2 π σ))

= θ1 x + θ2 x2 −

(

− θ12

4 θ2+

12

log(− π

θ2))

(3.45)

Hence the set of all univariate Gaussian density functions is an exponentialfamily, §3.2. The coordinates (θ1, θ2) is a natural coordinate system, §3.3 andϕ = − θ1

2

4 θ2+ 1

2 log(− πθ2

) = µ2

2 σ2 + log(√

2π σ) is its potential function.

3.7.2 Gaussian Information Metric

Proposition 3.11. The Fisher metric, §3.1 with respect to natural coordi-nates (θ1, θ2) is given by:

[gij ] =

[

−12 θ2

θ12 θ2

2

θ12 θ2

2θ2−θ1

2

2 θ23

]

=[

σ2 2µσ2

2µσ2 2σ2(

2µ2 + σ2)

]

(3.46)

Proof. Since ϕ is a potential function, the Fisher metric is given by the Hessianof ϕ, that is,

gij =∂2ϕ

∂θi∂θj. (3.47)

Then, we have the metric by a straightforward calculation.

Page 58: Information Geometry: Near Randomness and Near Independence

48 3 Information Geometry

3.7.3 Gaussian Mutually Dual Foliations

Since N represents an exponential family, a mixture coordinate system is givenby a potential function. We have

η1 =∂ϕ

∂θ1=

−θ12 θ2

= µ,

η2 =∂ϕ

∂θ2=

θ12 − 2 θ24 θ22 = µ2 + σ2 . (3.48)

Since (θ1, θ2) is a 1-affine coordinate system, (η1, η2) is a (−1)-affine coordinatesystem, and they are mutually dual with respect to the metric. Therefore theGaussian manifold has dually orthogonal foliations. The coordinates in (3.48)admit the potential function

λ = −12

(

1 + log(− π

θ2))

=−12

(1 + log(2π) + 2 log(σ)) . (3.49)

3.7.4 Gaussian Affine Immersions

We show that the Gaussian manifold can be realized in Euclidean R3 by an

affine immersion, §3.4.

Proposition 3.12. Let N be the Gaussian manifold with the Fisher metric gand the exponential connection ∇(1). Denote by (θ1, θ2) a natural coordinatesystem. Then M can be realized in R

3 by the graph of a potential function,namely, G can be realized by the affine immersion f, ξ:

f : N → R3 :(

θ1θ2

)

θ1θ2ϕ

⎠ , ξ =

001

⎠ .

where ψ is the potential function ϕ = − θ12

4 θ2+ 1

2 log(− πθ2

) .The submanifold, §2.0.5, of univariate Gaussian density functions with

zero mean (i.e. θ1 = 0) is represented by the curve

(−∞, 0) → R3 : θ2 → 0, θ2,

12

log(− π

θ2) ,

In addition, the submanifold of univariate Gaussian density functions withunit variance (i.e. θ2 = − 1

2) is represented by the curve

R → R3 : θ1 → θ1,−

12,θ1

2

2+

12

log(2π).

Page 59: Information Geometry: Near Randomness and Near Independence

3.8 Gaussian a-Geometry 49

3.8 Gaussian a-Geometry

By direct calculation we provide the α-connections, §3.3, and variousα-curvature objects of the Gaussian 2-manifold N : the α-curvature tensor,the α-Ricci curvature with its eigenvalues and eigenvectors, the α-sectionalcurvature, and the α-Gaussian curvature. Henceforth, we use the coordinatesystem (µ, σ).

3.8.1 Gaussian a-Connection

For each α ∈ R, the α (or ∇(α))-connection is the torsion-free affine connectionwith components, §2.0.6:

Γ(α)ij,k =

∫ ∞

−∞

(

∂2 log p∂ξi∂ξj

∂ log p∂ξk

+1 − α

2∂ log p∂ξi

∂ log p∂ξj

∂ log p∂ξk

)

p dx

Proposition 3.13. The functions Γ(α)ij,k are given by:

Γ(α)11,2 =

1 − α

σ3,

Γ(α)12,1 = −1 + α

σ3,

Γ(α)22,2 = −2 + 4α

σ3(3.50)

while the other independent components are zero.

We have an affine connection ∇(α) defined by:

〈∇(α)∂i

∂j , ∂k〉 = Γ(α)ij,k ,

So by solving the equations

Γ(α)ij,k =

3∑

h=1

gkh Γh(α)ij , (k = 1, 2).

we obtain the components of ∇(α):

Proposition 3.14. The components Γi(α)jk of the ∇(α)-connections are

given by:

Γ(α)112 = −α + 1

σ,

Γ(α)211 =

1 − α

2σ,

Γ(α)222 = −2α + 1

σ. (3.51)

while the other independent components are zero.

Page 60: Information Geometry: Near Randomness and Near Independence

50 3 Information Geometry

3.8.2 Gaussian a-Curvatures

Proposition 3.15. By direct calculation we have; the α-curvature tensor ofN, §2.0.6, is given by

R(α)1212 =

1 − α2

σ4, (3.52)

while the other independent components are zero.

Proposition 3.16. The components of the α-Ricci tensor are given by thesymmetric matrix R(α) = [R(α)

ij ]:

R(α)11 =

α2 − 12σ2

(3.53)

while the other components are zero.We note that R(α)

11 → α2−12 as σ → 1.

In addition, the eigenvalues and the eigenvectors for the α-Ricci tensorare given by:

(

α2 − 1)

σ2

(

121

)

(3.54)

(

1 00 1

)

. (3.55)

Proposition 3.17. The α-Gaussian curvature K(α) of N is given by:

K(α) =α2 − 1

2. (3.56)

So geometrically Gaussian manifold constitutes part of a pseudosphere whenα2 < 1.

3.9 Gaussian Mutually Dual Foliations

Since the family of Gaussian density functions N is an exponential family,a mixture coordinate system is given by the potential function ϕ = − θ1

2

4θ2+

12 log(− π

θ2), that is

η1 =∂ϕ(θ)∂θ1

= − θ12 θ2

= µ ,

η2 =∂ϕ(θ)∂θ2

=θ1

2 − 2 θ22

4 θ22 = µ2 + σ2 . (3.57)

Page 61: Information Geometry: Near Randomness and Near Independence

3.10 Gaussian Submanifolds 51

Since (θ1, θ2) is a 1-affine coordinate system, (η1, η2) is a (−1)-affine coor-dinate system, and they are mutually dual with respect to the Fisher metric.The coordinates in (ηi) have a potential function given by:

λ =−12

(

1 + log(− π

θ2))

=−12

(1 + log(2π) + 2 log(σ)) . (3.58)

The coordinates (θi) and (ηi) are mutually dual. Therefore the Gaussianmanifold N has dually orthogonal foliations, for example:

Take (η1, θ2) = (µ,− 12 σ2 ) as a coordinate system for N, then the Gaussian

density functions take the form:

p(x; η1, θ2) =

−θ2πe(x−η1)

2 θ2 for η1 ∈ R, θ2 ∈ R− , (3.59)

and the Fisher metric is

ds2g = −2 θ2 dη12 +

12 θ22 dθ2

2 for η1 ∈ R, θ2 ∈ R− . (3.60)

We remark that (θi) is a geodesic coordinate system of ∇(1), and (ηi) is ageodesic coordinate system of ∇(−1).

3.10 Gaussian Submanifolds

We consider three submanifolds N1, N2 and N3 of the Gaussian manifold N(3.41). These submanifolds have dimension 1 and so all the curvatures arezero.

3.10.1 Central Mean Submanifold

This is defined by N1 ⊂ N : µ = 0. The Gaussian density functions with zeromean are of form:

p(x;σ) =1√

2π σe−

x2

2 σ2 for x ∈ R, σ ∈ R+. (3.61)

Proposition 3.18. The information metric [gij ] is as follows:

ds2g = Gdσ2 =2σ2

dσ2 for σ ∈ R+ . (3.62)

Proposition 3.19. By direct calculation the α-connections of N1 are

Γ(α)11,1 = −2 + 4α

σ3,

Γ(α)111 = −1 + 2α

σ. (3.63)

Page 62: Information Geometry: Near Randomness and Near Independence

52 3 Information Geometry

3.10.2 Unit Variance Submanifold

This is defined as N2 ⊂ N : σ = 1. The Gaussian density functions with unitvariance are of form:

p(x;µ) =1√2π

e−(x−µ)2

2 for x ∈ R, µ ∈ R. (3.64)

Proposition 3.20. The information metric [gij ] is as follows:

ds2g = Gdσ2 = dµ2 for µ ∈ R . (3.65)

Proposition 3.21. The components of α-connections of the submanifold N2

are zero.

3.10.3 Unit Coefficient of Variation Submanifold

This is defined as N3 ⊂ N : µ = σ. The Gaussian density functions withidentical mean and standard deviation are of form:

p(x;σ) =1√

2π σe−

(x−σ)2

2 σ2 for x ∈ R, σ ∈ R+. (3.66)

Proposition 3.22. The information metric [gij ] is as follows:

ds2g = Gdσ2 =3σ2

dσ2 for σ ∈ R+. (3.67)

Proposition 3.23. By direct calculation the α-connections of N3 are

Γ(α)11,1 = −3 + 7α

σ3,

Γ(α)111 = −3 + 7α

3σ. (3.68)

3.11 Gaussian Affine Immersions

Proposition 3.24. Let N be the Gaussian manifold with the Fisher metricg and the exponential connection ∇(1). Denote by (θ1, θ2) = ( µ

σ2 ,− 12 σ2 ) the

natural coordinate system. Then N can be realized in R3 by the graph of a

potential function, namely, G can be realized by the affine immersion, §3.4,f, ξ:

f : N → R3 :(

θ1θ2

)

θ1θ2ϕ

⎠ , ξ =

001

⎠ . (3.69)

where ϕ is the potential function

Page 63: Information Geometry: Near Randomness and Near Independence

3.12 Log-Gaussian Manifold 53

ϕ = − θ12

4 θ2+

12

log(− π

θ2) =

µ2

2σ2+ log(

√2π σ) .

We consider particular submanifolds; central mean submanifold, unit vari-ance submanifold and the submanifold with identical mean and standard de-viation. We represent them as curves in R

3 as follows:

1. The central mean submanifold N1:The Gaussian density functions with zero mean (i.e. θ1 = 0 and θ2 =− 1

2 σ2 ) are represented by the curve

(−∞, 0) → R3 : θ2 → 0, θ2,

12

log(− π

θ2) ,

2. The unit variance submanifold N2:The Gaussian density functions with unit variance (i.e. θ1 = µ and θ2 =− 1

2) are represented by the curve

R → R3 : θ1 → θ1,−

12,θ1

2

2+

12

log(2π).

3. The submanifold N3 with identical mean and standard deviation:The Gaussian density functions with identical mean and standard devia-tion µ = σ (i.e. θ1 = 1

σ and θ2 = − 12 σ2 ) are represented by the curve

(0,∞) → R3 : θ1 → θ1,−2 θ12,

12

+ log(√

2πθ1

) .

3.12 Log-Gaussian Manifold

The log-Gaussian density functions arise from the Gaussian density func-tions (3.42) for the non negative random variable x = log 1

m , or equivalently,m = e−x:

g(m) =1

m√

2π σe−

(log(m)+µ)2

2 σ2 for µ ∈ R, σ ∈ R+, (3.70)

The case where µ = 0 and σ = 1 is called the standard log-Gaussian distribu-tion. Figure 3.6 shows a plot of log-Gaussian density functions g(m;µ, σ) forthe range n ∈ [0, 3]; with µ = 0, and σ = 1, 1.5, 2. The case σ = 1 correspondsto the standard Log-Gaussian distribution g(m; 0, 1).

Corollary 3.25. The mean m, standard deviation σm, and coefficient of vari-ation, §1.2, cvm, for log-Gaussian density functions are given by:

Page 64: Information Geometry: Near Randomness and Near Independence

54 3 Information Geometry

0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

1

1.2

1.4

σ = 1.5σ = 2

σ = 1

m

g(m; µ, σ)

Fig. 3.6. Log-Gaussian probability density functions g(m; µ, σ); with µ = 0, andσ = 1, 1.5, 2. The case σ = 1 corresponds to the standard log-Gaussian densityg(m; 0, 1).

m = eσ22 −µ ,

σm =√

(

eσ2−2 µ) (

eσ2 − 1)

,

cvm = eµ−σ22

(

eσ2−2 µ) (

eσ2 − 1)

. (3.71)

Directly from the definition of the Fisher metric we deduce:

Proposition 3.26. The family of log-Gaussian density functions for randomvariable n determines a Riemannian 2-manifold which is an isometric iso-morph of the Gaussian 2-manifold.

Proof. We show that the log-Gaussian and Gaussian families have the sameFisher metric. From

g(m) = p(x)dx

dm

Hencelog g(m) = log p(x) + log(

dx

dm)

then differentiation of this relation with respect to θi and θj , with (θ1, θ2) =( µ

σ2 ,− 12 σ2 ), yields

∂2 log g(m)∂θi∂θj

=∂2 log p(x)∂θi∂θj

from (3.1) we can see that p(x) and g(m) have the same Fisher metric. So theidentity map on the parameter space R × R

+ determines an isometry, §2.0.5,of Riemannian manifolds.

Page 65: Information Geometry: Near Randomness and Near Independence

4

Information Geometry of Bivariate Families

From the study by Arwini [13], we provide information geometry, includingthe α-geometry, of several important families of bivariate probability densityfunctions. They have marginal density functions that are gamma density func-tions, exponential density functions and Gaussian density functions. These areused for applications in the sequel, when we have two random variables thathave non-zero covariance—such as will arise for a coupled pair of randomprocesses.

The multivariate Gaussian is well-known and its information geometry hasbeen reported before [183, 189]; our recent work has contributed the bivariateGaussian α-geometry. Surprisingly, it is very difficult to construct a bivariateexponential distribution, or for that matter a bivariate Poisson distributionthat has tractable information geometry. However we have calculated the caseof the Freund bivariate mixture exponential distribution [89]. The only bivari-ate gamma distribution for which we have found the information geometrytractable is the McKay case [146] which is restricted to positive covariance,and we begin with this.

4.1 McKay Bivariate Gamma 3-Manifold M

The results in this section were computed in [13] and first reported in [14].The McKay bivariate gamma distribution is one of the bivariate densityfunctions constructed by statisticians using the so-called conditional method.McKay [146] derived it as follows:

Let (X1,X2, ...,XN ) be a random sample from a normal population. Sup-pose s2N is the sample variance, and let s2n be the variance in a sub-sample ofsize n. Then s2N and s2n jointly have the McKay bivariate gamma distribution.

The information geometry of the 3-manifold of McKay bivariate gammadensity functions can provide a metrization of departures from Poisson ran-domness and departures from independence for bivariate processes. The curva-ture objects are derived, including those on three submanifolds. As in the case

K. Arwini, C.T.J. Dodson, Information Geometry. 55Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 66: Information Geometry: Near Randomness and Near Independence

56 4 Information Geometry of Bivariate Families

0

1

2

012

0

0.5

1

1.5

y x

Fig. 4.1. Part of the family of McKay bivariate gamma probability density func-tions; observe that these are zero outside the octant 0 < x < y < ∞. Here thecorrelation coefficient has been set to ρxy = 0.6 and α1 = 5.

of bivariate Gaussian manifolds, we have negative scalar curvature but here itis not constant and we show how it depends on correlation. The multiplicativegroup action of the real numbers as a scale-changing process influences theRiemannian geometric properties. These results have potential applications,for example, in the characterization of stochastic materials.

Fisher Information Metric

The classical family of McKay bivariate gamma density functions M, is definedon 0 < x < y < ∞ with parameters α1, σ12, α2 > 0 and probability densityfunctions

f∗(x, y;α1, σ12, α2) =( α1

σ12)

(α1+α2)2 xα1−1(y − x)α2−1e

−√

α1σ12

y

Γ (α1)Γ (α2). (4.1)

Here σ12 is the covariance, §1.3, of X and Y . One way to view this is thatf∗(x, y) is the probability density for the two random variables X and Y =X + Z where X and Z both have gamma density functions.

The correlation coefficient, §1.3, and marginal functions, §1.3 of X and Yare given by

ρ(X,Y ) =√

α1

α1 + α2> 0 (4.2)

Page 67: Information Geometry: Near Randomness and Near Independence

4.1 McKay Bivariate Gamma 3-Manifold M 57

ρ

α2

α1

Fig. 4.2. Correlation coefficient ρ from equation (4.2) for McKay probablity densityfunctions, in terms of α1, α2.

f∗X(x) =

( α1σ12

)α12 xα1−1e

−√

α1σ12

x

Γ (α1), x > 0 (4.3)

f∗Y (y) =

( α1σ12

)(α1+α2)

2 y(α1+α2)−1e−√

α1σ12

y

Γ (α1 + α2), y > 0 (4.4)

Figure 4.2 shows a plot of the correlation coefficient from equation (4.2). Themarginal density functions of X and Y are gamma with shape parameters α1

and α1 + α2, respectively; note that it is not possible to choose parameterssuch that both marginal functions are exponential, §1.2.2.

Examples of the outputs of two McKay distribution simulators are shownin Figure 9.16 and Figure 10.23. In Figure 9.16, each plot shows 5000 pointswith coordinates (x, y) and x < y, for three values of the correlation coefficient:ρ = 0.5, 0.7, 0.9.

Proposition 4.1. Let M be the family of McKay bivariate gamma densityfunctions (4.1), then (α1, σ12, α2) is a local coordinate system, and M becomesa 3-manifold, §2.0.2, with Fisher information metric, §2.0.5, §3.1,

Page 68: Information Geometry: Near Randomness and Near Independence

58 4 Information Geometry of Bivariate Families

[gij ] =

−3 α1+α24 α12 + (Γ ′(α1)

Γ (α1))′ α1−α2

4 α1 σ12− 1

2 α1α1−α24 α1 σ12

α1+α24 σ122

12 σ12

− 12 α1

12 σ12

(Γ ′(α2)Γ (α2)

)′

⎦(4.5)

The inverse [gij ] of [gij ] is given by:

g11 = −(

−1 + (α1 + α2) ψ′(α2)ψ′(α2) + ψ′(α1) (1 − (α1 + α2) ψ′(α2))

)

,

g12 = g21 =σ12 (1 + (α1 − α2) ψ′(α2))

α1 (ψ′(α2) + ψ′(α1) (1 − (α1 + α2) ψ′(α2))),

g13 = g31 =1

−ψ′(α2) + ψ′(α1) (−1 + (α1 + α2) ψ′(α2)),

g22 =σ12

2(

−1 +(

−3α1 + α2 + 4α12 ψ′(α1)

)

ψ′(α2))

α12 (−ψ′(α2) + ψ′(α1) (−1 + (α1 + α2) ψ′(α2)))

,

g23 = g32 =σ12 (−1 + 2α1 ψ

′(α1))α1 (ψ′(α2) + ψ′(α1) (1 − (α1 + α2) ψ′(α2)))

,

g33 = −(

−1 + (α1 + α2) ψ′(α1)ψ′(α2) + ψ′(α1) (1 − (α1 + α2) ψ′(α2))

)

, (4.6)

where we have abbreviated ψ′(α1) = (Γ ′(α1)Γ (α1)

)′.

4.2 McKay Manifold Geometry in Natural Coordinates

The original presentation of the McKay distribution was in the form:

f(x, y) =c(α1+α2)xα1−1(y − x)α2−1e−cy

Γ (α1)Γ (α2)(4.7)

defined on 0 < x < y < ∞ with parameters α1, c, α2 > 0. The marginalfunctions, §1.3, of X and Y are given by:

fX(x) =cα1xα1−1e−c x

Γ (α1), x > 0 (4.8)

fY (y) =c(α1+α2)y(α1+α2)−1e−c y

Γ (α1 + α2), y > 0 (4.9)

The covariance and correlation coefficient, §1.3, of X and Y are given by:

σ12 =α1

c2

ρ(X,Y ) =√

α1

α1 + α2

Page 69: Information Geometry: Near Randomness and Near Independence

4.3 McKay Densities Have Exponential Type 59

4.3 McKay Densities Have Exponential Type

The McKay family (4.7) forms an exponential family, §3.2.

Proposition 4.2. Let M be the set of McKay bivariate gamma density func-tions, that is

M = f |f(x, y;α1, c, α2) =c(α1+α2)xα1−1(y − x)α2−1e−cy

Γ (α1)Γ (α2), (4.10)

for y > x > 0, α1, c, α2 > 0.

Then (α1, c, α2) is a natural coordinate system, §3.3, and

ϕ(θ) = logΓ (α1) + logΓ (α2) − (α1 + α2) log c (4.11)

is the corresponding potential function, §3.2.

Proof.

log f(x, y;α1, c, α2) = log(

c(α1+α2)xα1−1(y − x)α2−1e−c y

Γ (α1)Γ (α2)

)

= − log x− log(y − x) (4.12)+α1(log x) + c(−y) + α2(log(y − x)) (4.13)−(logΓ (α1) + logΓ (α2) − (α1 + α2) log c). (4.14)

Hence the set of all McKay bivariate gamma density functions is an exponen-tial family. The terms in the line (4.13) imply that (θ1, θ2, θ3) = (α1, c, α2)is a natural coordinate system and (x1, x2, x3) = (F1(x), F2(x), F3(x)) =(log x,−y, log(y − x)) is a random variable space (the random variablesx1, x2, x3 are not independent, but related by x3 = log(−x2−ex1), and (4.14)implies that ϕ(θ) = logΓ (α1) + logΓ (α2) − (α1 + α2) log c = logΓ (θ1) +logΓ (θ3) − (θ1 + θ3) log θ3 is its potential function.We remark that (4.12) implies that C(X,Y ) = − log x− log(y−x) = −x1−x3

is the normalization function since f is a probability density function.

4.3.1 McKay Information Metric

Proposition 4.3. Let M be the set of McKay bivariate gamma density func-tions (4.7), then using coordinates (α1, c, α2), M is a 3-manifold with Fisherinformation metric [gij ], §3.1, given by:

[gij ] =

ψ′(α1) − 1c 0

− 1c

α1+α2c2 − 1

c0 − 1

c ψ′(α2)

⎦ (4.15)

Page 70: Information Geometry: Near Randomness and Near Independence

60 4 Information Geometry of Bivariate Families

Proof. Since (α1, c, α2) is the natural coordinate system, and ϕ(θ) its potentialfunction, §3.2 (4.11), the Fisher metric is given by the Hessian of ϕ(θ), that is,

gij =∂2ϕ(θ)∂θi∂θj

.

Then, we have the Fisher metric by a straightforward calculation.

4.4 McKay a-Geometry

By direct calculation, using §3.3, we provide the α-connections , and variousα-curvature objects of the McKay 3-manifold M : the α-curvature tensor, theα-Ricci curvature, the α-scalar curvature, the α-sectional curvature, and theα-mean curvature.

4.4.1 McKay a-Connection

For each α ∈ R, the α− (or ∇(α))-connection is the torsion-free affine connec-tion with components:

Γ(α)ij,k =

1 − α

2∂i ∂j ∂kϕ(θ) ,

where ϕ(θ) is the potential function, and ∂i = ∂∂θi

.Since the set of McKay bivariate gamma density functions is an exponen-

tial family, the connection ∇(1) is flat. In this case, (α1, c, α2) is a 1-affinecoordinate system. So the 1 and (-1)-connections on the McKay manifold areflat.

Proposition 4.4. The functions Γ(α)ij,k are given by:

Γ(α)11,1 =

(1 − α) ψ′′(α1)2

,

Γ(α)22,1 = Γ

(α)12,2 = Γ

(α)23,2 = Γ

(α)22,3 =

(1 − α)2 c2

,

Γ(α)22,2 = − (1 − α) (α1 + α2)

c3,

Γ(α)33,3 =

(1 − α) ψ′′(α2)2

(4.16)

while the other independent components are zero.

We have an affine connection ∇(α) defined by:

〈∇(α)∂i

∂j , ∂k〉 = Γ(α)ij,k ,

Page 71: Information Geometry: Near Randomness and Near Independence

4.4 McKay a-Geometry 61

So by solving the equations

Γ(α)ij,k =

3∑

h=1

gkh Γh(α)ij , (k = 1, 2, 3).

we obtain the components of ∇(α):

Proposition 4.5. The components Γi(α)jk of the ∇(α)-connections are given

from §2.0.6 and §3.3 by:

Γ(α)111 =

(1 − α) ψ′′(α1) (−1 + ψ′(α2) (α1 + α2))2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)211 =

c (1 − α) ψ′(α2)ψ′′(α1)2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)311 =

(1 − α) ψ′′(α1)2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)112 = Γ

(α)123 =

(1 − α) ψ′(α2)2 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)212 = Γ

(α)223 =

(1 − α) ψ′(α1)ψ′(α2)2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)312 = Γ

(α)323 =

(1 − α) ψ′(α1)2 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)122 =

− (1 − α) ψ′(α2) (α1 + α2)2 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)222 =

− (1 − α) (−ψ′(α1) − ψ′(α2) + 2ψ′(α1)ψ′(α2) (α1 + α2))2 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)322 =

− (1 − α) ψ′(α1) (α1 + α2)2 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)133 =

(1 − α) ψ′′(α2)2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)233 =

c (1 − α) ψ′(α1)ψ′′(α2)2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

Γ(α)333 =

(1 − α) ψ′′(α2) (−1 + ψ′(α1) (α1 + α2))2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

. (4.17)

while the other independent components are zero.

4.4.2 McKay a-Curvatures

Proposition 4.6. The components R(α)ijkl of the α-curvature tensor, §3.3, are

given by:

Page 72: Information Geometry: Near Randomness and Near Independence

62 4 Information Geometry of Bivariate Families

R(α)1212 =

−(

α2 − 1)

ψ′(α2) (ψ′(α1) + ψ′′(α1) (α1 + α2))4 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

R(α)1213 =

(

α2 − 1)

ψ′(α2)ψ′′(α1)4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

R(α)1223 =

(

α2 − 1)

ψ′(α1)ψ′(α2)4 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

R(α)1313 =

(

α2 − 1)

ψ′′(α1)ψ′′(α2)4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

R(α)1323 =

(

α2 − 1)

ψ′(α1)ψ′′(α2)4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

,

R(α)2323 =

−(

α2 − 1)

ψ′(α1) (ψ′(α2) + ψ′′(α2) (α1 + α2))4 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

(4.18)

while the other independent components are zero.

Proposition 4.7. The components of the α-Ricci tensor are given by the sym-metric matrix R(α) = [R(α)

ij ]:

R(α)11 =

(

α2 − 1)

(−ψ′(α1)ψ′′(α1)

(

ψ′(α2)2 − ψ′′(α2)

)

(α1 + α2)

4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 +

− (ψ′(α1)ψ′(α2) (ψ′(α1)ψ′(α2) − 2ψ′′(α1))) − ψ′′(α1)ψ′′(α2)4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

2 ) ,

R(α)12 =

(

α2 − 1)

(

(

ψ′(α2)2ψ′′(α1) + ψ′(α1)

2ψ′′(α2)

)

(α1 + α2)

4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 +

ψ′(α2) (ψ′(α1) (ψ′(α1) + ψ′(α2)) − ψ′′(α1)) − ψ′(α1)ψ′′(α2)4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))

2 ) ,

R(α)13 = −

(

α2 − 1)

(

ψ′(α1)2 + ψ′′(α1)

) (

ψ′(α2)2 + ψ′′(α2)

)

4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 ,

R(α)22 =−

(

α2−1)

(α1+α2)(ψ′(α2) (ψ′(α1) (ψ′(α1) + ψ′(α2)) − ψ′′(α1))

4 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2

+−ψ′(α1)ψ′′(α2) +

(

ψ′(α2)2ψ′′(α1) + ψ′(α1)

2ψ′′(α2)

)

(α1 + α2)

4 c2 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 ) ,

Page 73: Information Geometry: Near Randomness and Near Independence

4.4 McKay a-Geometry 63

R(α)23 =

(

α2 − 1)

(ψ′(α2) (ψ′(α1) (ψ′(α1) + ψ′(α2)) − ψ′′(α1))

4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2

+−ψ′(α1)ψ′′(α2) +

(

ψ′(α2)2ψ′′(α1) + ψ′(α1)

2ψ′′(α2)

)

(α1 + α2)

4 c (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 ) ,

R(α)33 =

(

α2 − 1)

(−(

ψ′(α1)2ψ′(α2)

2)

+ (2ψ′(α1)ψ′(α2) − ψ′′(α1)) ψ′′(α2)

4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2

+−ψ′(α2)

(

ψ′(α1)2 − ψ′′(α1)

)

ψ′′(α2) (α1 + α2)

4 (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2))2 ) . (4.19)

Proposition 4.8. The α-scalar curvature R(α) of M is given by:

R(α) =−(

α2 − 1)

(ψ′(α2) (ψ′(α1) (ψ′(α1) + ψ′(α2)) − 2 ψ′′(α1)) − 2 ψ′(α1) ψ′′(α2)

2 (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2))2

+

(

ψ′(α2)2ψ′′(α1) +

(

ψ′(α1)2 − ψ′′(α1)

)

ψ′′(α2))

(α1 + α2)

2 (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2))2 ) . (4.20)

The α-scalar curvature R(α) has limiting value (α2−1)2 as α1, α2 → 0. So

M has a negative scalar curvature R(0), and this has limiting value − 12 as

α1, α2 → 0. Figure 4.3 shows a plot of R(0) for the range α1 , α2 ∈ [0, 4].

12

34

1

2

3

4

-1.4-1.2-1

α2

α1

Scalar curvature R(0)

Fig. 4.3. The scalar curvature R(0) for the McKay bivariate gamma 3-manifold M ;the limiting value at the origin is − 1

2.

Page 74: Information Geometry: Near Randomness and Near Independence

64 4 Information Geometry of Bivariate Families

Proposition 4.9. The α-sectional curvatures of M are given by:

(α)(1, 2)=

(

α2 − 1)

ψ′(α2) (ψ′(α1) + ψ′′(α1) (α1 + α2))

4 (−1 + ψ′(α1)(α1 + α2)) (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2)(α1 + α2)),

(α)(1, 3)=−(

α2 − 1)

ψ′′(α1) ψ′′(α2)

4 ψ′(α1) ψ′(α2) (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2)),

(α)(2, 3)=

(

α2 − 1)

ψ′(α1) (ψ′(α2) + ψ′′(α2) (α1 + α2))

4 (ψ′(α2) (α1 + α2) − 1) (ψ′(α1) + ψ′(α2) − ψ′(α1)ψ′(α2) (α1 + α2)).

(4.21)

Proposition 4.10. The α-mean curvatures (α)(λ) (λ = 1, 2, 3) are given by:

(α)(1) =

(

α2 − 1)

(

−ψ′′(α1) ψ′′(α2)ψ′(α1)

+ψ′(α2)2 (ψ′(α1)+ψ′′(α1) (α1+α2))

−1+ψ′(α1) (α1+α2)

)

8 ψ′(α2) (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2)),

(α)(2) =

(

α2 − 1)

(

ψ′(α2) (ψ′(α1)+ψ′′(α1) (α1+α2))−1+ψ′(α1) (α1+α2)

+ψ′(α1) (ψ′(α2)+ψ′′(α2) (α1+α2))

−1+ψ′(α2) (α1+α2)

)

8 (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2)),

(α)(3) =

(

α2 − 1)

(

−ψ′′(α1) ψ′′(α2)ψ′(α2)

+ψ′(α1)2 (ψ′(α2)+ψ′′(α2) (α1+α2))

−1+ψ′(α2) (α1+α2)

)

8 ψ′(α1) (ψ′(α1) + ψ′(α2) − ψ′(α1) ψ′(α2) (α1 + α2)). (4.22)

4.5 McKay Mutually Dual Foliations

We give mutually dual foliations, cf. §3.5.4, of the McKay bivariate gammamanifold. Since M is an exponential family, a mixture coordinate system isgiven by the potential function ϕ(θ) (4.14), that is

η1 =∂ϕ(θ)∂α1

= ψ(α1) − log c,

η2 =∂ϕ(θ)∂c

= −α1 + α2

c,

η3 =∂ϕ(θ)∂α2

= ψ(α2) − log c. (4.23)

Since (α1, c, α2) is a 1-affine coordinate system, (η1, η2, η3) is a (−1)-affinecoordinate system, and they are mutually dual with respect to the Fishermetric. The coordinates (ηi) have a potential function given by:

λ(η) = α1ψ(α1) + α2ψ(α2) − (α1 + α2) − logΓ (α1) − logΓ (α2) . (4.24)

In fact we have also dually orthogonal foliations. For example, with(α1, η2, α2) as a coordinate system for M the McKay densities take the form:

Page 75: Information Geometry: Near Randomness and Near Independence

4.6 McKay Submanifolds 65

f(x, y;α1, η2, α2) =(

−α1 + α2

η2

)α1+α2 xα1−1 (y − x)α2−1

Γ (α1)Γ (α2)e

α1+α2η2

y. (4.25)

and the Fisher metric is⎡

ψ′(α1) − 1α1+α2

0 − 1α1+α2

0 α1+α2(η2)2

0− 1

α1+α20 ψ′(α2) − 1

α1+α2

⎦ . (4.26)

We remark that (α1, c, α2) is a geodesic coordinate system of ∇(1), and(η1, η2, η3) is a geodesic coordinate system of ∇(−1).

4.6 McKay Submanifolds

We consider three submanifolds, §2.0.5, M1,M2 and M3 of the 3-manifold Mof McKay bivariate gamma density functions (4.11) f(x, y;α1, c, α2), wherewe use the coordinate system (α1, c, α2). These submanifolds have dimension2 and so it follows that the scalar curvature is twice the Gaussian curvature,R = 2K. Recall from §1.3 that the correlation is given by

ρ =√

α1

α1 + α2.

In the cases of M1 and M2 the scalar curvature can be shown as a functionof ρ only.

4.6.1 Submanifold M1

This is defined as M1 ⊂ M : α1 = 1. The density functions are of form:

f(x, y; 1, c, α2) =c1+α2(y − x)α2−1e−c y

Γ (α2), (4.27)

defined on 0 < x < y < ∞ with parameters c, α2 > 0. The correlationcoefficient and marginal functions, §1.3, of X and Y are given by:

ρ(X,Y ) =1√

1 + α2(4.28)

fX(x) = c e−c x, x > 0 (4.29)

fY (y) =c(1+α2)yα2e−c y

α2 Γ (α2), y > 0 (4.30)

So here we have α2 = 1−ρ2

ρ2 , which in practice would give a measure of thevariability not due to the correlation.

Page 76: Information Geometry: Near Randomness and Near Independence

66 4 Information Geometry of Bivariate Families

Proposition 4.11. The metric tensor [gij ] has component matrix

G = [gij ] =[

1+α2c2 − 1

c− 1

c ψ′(α2)

]

. (4.31)

Proposition 4.12. The α-connections of M1 are

Γ(α)11,1 =

(α− 1) (1 + α2)c3

,

Γ(α)22,1 =

− (α− 1)2 c2

,

Γ(α)22,2 =

− (α− 1) ψ′′(α2)2

,

Γ 111 =

(α− 1)2 c

(

2 +1

−1 + ψ′(α2) (1 + α2)

)

,

Γ 112 =

− (α− 1) ψ′(α2)2 (−1 + ψ′(α2) (1 + α2))

,

Γ 122 =

−c (α− 1) ψ′′(α2)2 (−1 + ψ′(α2) (1 + α2))

,

Γ 211 =

(α− 1) (1 + α2)2 c2 (−1 + ψ′(α2) (1 + α2))

,

Γ 212 =

− (α− 1)2 c (−1 + ψ′(α2) (1 + α2))

,

Γ 222 =

− (α− 1) ψ′′(α2) (1 + α2)2 (−1 + ψ′(α2) (1 + α2))

. (4.32)

The Levi-Civita connection, §2.0.6, ∇ is that given by setting α = 0 in theα-connections above and geodesics, §2.1, for this case are curves h : (a, b) →M1 satisfying

∇hh = 0

which expands using (2.5). This equation is difficult to solve analytically butwe can find numerical solutions using the Mathematica programs of Gray [99].Figure 4.4 shows some geodesics passing through (c, α2) = (1, 1) in the gammasubmanifold M1.

Proposition 4.13. The curvature tensor, §2.0.6, of M1 is given by

R(α)1212 =

(

α2 − 1)

(ψ′(α2) + ψ′′(α2) (1 + α2))4 c2 (−1 + ψ′(α2) (1 + α2))

, (4.33)

while the other independent components are zero.

Page 77: Information Geometry: Near Randomness and Near Independence

4.6 McKay Submanifolds 67

1 2 3 4 50

1

2

3

4

5

α2

c

Fig. 4.4. Geodesics passing through (c, α2) = (1, 1) in the McKay submanifold M1.

By contraction we obtain:Ricci tensor [R(α)

ij ] =

−(α2−1) (1+α2) (ψ′(α2)+ψ′′(α2) (1+α2))4 c2 (−1+ψ′(α2) (1+α2))

2(α2−1) (ψ′(α2)+ψ′′(α2) (1+α2))

4 c (−1+ψ′(α2) (1+α2))2

(α2−1) (ψ′(α2)+ψ′′(α2) (1+α2))4 c (−1+ψ′(α2) (1+α2))

2

−(α2−1)ψ′(α2) (ψ′(α2)+ψ′′(α2) (1+α2))4 (−1+ψ′(α2) (1+α2))

2

(4.34)Scalar curvature:

R(α) =−(

α2 − 1)

(ψ′(α2) + ψ′′(α2) (1 + α2))

2 (−1 + ψ′(α2) (1 + α2))2 . (4.35)

The α-scalar curvature R(α) for M1 can be written as a function of ρ only asfollows:

Page 78: Information Geometry: Near Randomness and Near Independence

68 4 Information Geometry of Bivariate Families

R(α)(ρ) =−(

α2 − 1)

(

ψ′( 1−ρ2

ρ2 ) + ψ′′( 1−ρ2

ρ2 )(

1 + 1−ρ2

ρ2

))

2(

−1 + ψ′(1−ρ2

ρ2 )(

1 + 1−ρ2

ρ2

))2 . (4.36)

and this has limiting value (α2−1)3 as ρ → 0, and 0 as ρ → 1.

4.6.2 Submanifold M2

This is defined as M2 ⊂ M : α2 = 1. The density functions are of form:

f(x, y;α1, c, 1) =c(α1+1)xα1−1e−c y

Γ (α1), (4.37)

defined on 0 < x < y < ∞ with parameters α1, c > 0. The correlationcoefficient and marginal functions, of X and Y are given by:

ρ(X,Y ) =√

α1

1 + α1(4.38)

fX(x) =cα1xα1−1e−c x

Γ (α1), x > 0 (4.39)

fY (y) =c(α1+1)yα1e−c y

α1 Γ (α1), y > 0 (4.40)

Here we have α1 = ρ2

1−ρ2 .

Proposition 4.14. The metric tensor [gij ] is as follows:

G = [gij ] =[

ψ′(α1) − 1c

− 1c

1+α1c2

]

. (4.41)

It follows that the geodesic curves in M2 are essentially the same as in M1,but with the order of coordinates interchanged. Hence they look like those inFigure 4.4 but with coordinates (α1, c) instead of (c, α2).

Proposition 4.15. The α-connections of M2 are

Γ(α)11,1 =

− (α− 1) ψ′′(α1)2

,

Γ(α)12,2 =

− (α− 1)2 c2

,

Γ(α)22,2 =

(α− 1) (1 + α1)c3

,

Γ 111 =

− (α− 1) ψ′′(α1) (1 + α1)2 (−1 + ψ′(α1) (1 + α1))

,

Page 79: Information Geometry: Near Randomness and Near Independence

4.6 McKay Submanifolds 69

Γ 112 =

− (α− 1)2 c (−1 + ψ′(α1) (1 + α1))

,

Γ 122 =

(α− 1) (1 + α1)2 c2 (−1 + ψ′(α1) (1 + α1))

,

Γ 211 =

−c (α− 1) ψ′′(α1)2 (−1 + ψ′(α1) (1 + α1))

,

Γ 212 =

− (α− 1) ψ′(α1)2 (−1 + ψ′(α1) (1 + α1))

,

Γ 222 =

(α− 1)2 c

(

2 +1

−1 + ψ′(α1) (1 + α1)

)

. (4.42)

Proposition 4.16. The α-curvature tensor is given by

R(α)1212 =

(

α2 − 1)

(ψ′(α1) + ψ′′(α1) (1 + α1))4 c2 (−1 + ψ′(α1) (1 + α1))

. (4.43)

while the other independent components are zero.By contraction we obtain:

α-Ricci tensor [R(α)ij ] =

−(α2−1)ψ′(α1) (ψ′(α1)+ψ′′(α1) (1+α1))4 (−1+ψ′(α1) (1+α1))

2(α2−1) (ψ′(α1)+ψ′′(α1) (1+α1))

4 c (−1+ψ′(α1) (1+α1))2

(α2−1) (ψ′(α1)+ψ′′(α1) (1+α1))4 c (−1+ψ′(α1) (1+α1))

2

−(α2−1) (1+α1) (ψ′(α1)+ψ′′(α1) (1+α1))4 c2 (−1+ψ′(α1) (1+α1))

2

⎦ .

(4.44)α-Scalar curvature:

R(α)(α1, α2) =−(

α2 − 1)

(ψ′(α1) + ψ′′(α1) (1 + α1))

2 (−1 + ψ′(α1) (1 + α1))2 . (4.45)

Note that, the α-scalar curvature R(α) for M2 can be written as a function ρonly:

R(α)(ρ) =−(

α2 − 1)

(

ψ′( ρ2

1−ρ2 ) + ψ′′( ρ2

1−ρ2 )(

1 + ρ2

1−ρ2

))

2(

−1 + ψ′( ρ2

1−ρ2 )(

1 + ρ2

1−ρ2

))2 . (4.46)

and this has limiting value 0 as ρ → 0, and (α2−1)3 as ρ → 1. Figure 4.5 shows

a plot of R(0) as a function of correlation ρ for M1 and M2, for the rangeρ ∈ [0, 1].

4.6.3 Submanifold M3

This is defined as M3 ⊂ M : α1 + α2 = 1. The density functions are of form:

Page 80: Information Geometry: Near Randomness and Near Independence

70 4 Information Geometry of Bivariate Families

0 0.2 0.4 0.6 0.8 1

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

M1

M2

Scalar curvature R(0)

Correlation ρ

Fig. 4.5. The scalar curvature R(0) as a function of correlation ρ for McKay sub-manifolds: M1 (M with α1 = 1) where R(0) increases from − 1

3to 0, and M2 (M

with α2 = 1) where R(0) decreases from 0 to − 13.

f(x, y;α1, c) =c xα1−1 (y − x)−α1

Γ (1 − α1)Γ (α1)e−c y , (4.47)

defined on 0 < x < y < ∞ with parameters α1 < 1, c > 0. The correlationcoefficient and marginal functions, of X and Y are given by:

ρ(X,Y ) =√α1 (4.48)

fX(x) =cα1xα1−1e−c x

Γ (α1), x > 0 (4.49)

fY (y) = c e−c y, y > 0 (4.50)

So here we have α1 = ρ2.

Proposition 4.17. The metric tensor [gij ] is as follows:

G = [gij ] =[

π2 csc(π α1)2 0

0 1c2

]

. (4.51)

Proposition 4.18. The α-connections of M3 are

Γ(α)11,1 = π3 (α− 1) cot(π α1) csc(π α1)

2,

Γ(α)22,2 =

α− 1c3

,

Γ 111 = π (α− 1) cot(π α1) ,

Γ 222 =

α− 1c

. (4.52)

Proposition 4.19. The α-curvature tensor, the α-Ricci tensor and theα-scalar curvature of M3 are zero.

Page 81: Information Geometry: Near Randomness and Near Independence

4.7 McKay Bivariate Log-Gamma Manifold ˜M 71

4.7 McKay Bivariate Log-Gamma Manifold ˜M

We introduce a McKay bivariate log-gamma distribution, which has log-gamma marginal functions, §1.3, §3.6. This family of densities determinesa Riemannian 3-manifold which is isometric to the McKay manifold, §3.1.

McKay bivariate log-gamma density functions arise from the McKay bi-variate gamma density functions (4.7) for the non-negative random variablesx = log 1

n and y = log 1m , or equivalently, n = e−x and m = e−y. The McKay

bivariate log-gamma density functions

g(n,m) =cα1+α2 mc−1 (− log n)α1−1 (log(n) − log(m))α2−1

nΓ (α1)Γ (α2)(4.53)

are defined on 0 < m < n < 1 with parameters αi, c > 0 (i = 1, 2).

Corollary 4.20. The covariance, §1.3, and marginal density functions, §1.3,of n and m are given by:

Cov(n,m) = cα1+α2

(

(−1)2α2+1cα1

(1 + c)2 α1+α2+

(−1)α2

(−1 − c)α2 (2 + c)α1

)

, (4.54)

gn(n) =cα1 nc−1 (− log n)α1−1

Γ (α1), (4.55)

gm(m) =cα1+α2 mc−1 (− logm)α1+α2−1

Γ (α1 + α2). (4.56)

Note that the marginal functions are log-gamma density functions.

Proposition 4.21. The family of McKay bivariate log-gamma density func-tions for random variables n and m where 0 < m < n < 1, determines aRiemannian 3-manifold ˜M with the following properties• it contains the uniform-log gamma distribution when c = 1, as α1, α2 → 1.• the structure is given by the information-theoretic metric• ˜M is an isometric isomorph §3.1 of a the McKay 3-manifold M.

Proof. Firstly, when c = 1 and α1 → 1, we have the limiting joint densityfunction is given by

limα1→1

g(n,m;α1, 1, α2) = g(n,m; 1, 1, α2) =(log(n) − log(m))α2−1

nΓ (α2)

limα2→1

g(n,m; 1, 1, α2) = g(n,m; 1, 1, 1) =1n.

So g(n,m; 1, 1, 1) determines the uniform-log gamma distribution, whichhas marginal functions: uniform distribution gn(n) = 1 and log-gamma dis-tribution gm(m) = − log m

Γ (2) .

Page 82: Information Geometry: Near Randomness and Near Independence

72 4 Information Geometry of Bivariate Families

We show that the McKay bivariate log-gamma and McKay bivariategamma families have the same Fisher information metric.

g(n,m) = f(x, y)dx dy

dn dm

Hencelog g(n,m) = log f(x, y) + log(

dx dy

dn dm)

then double differentiation of this relation with respect to θi and θj ; when(θ1, θ2, θ3) = (α1, c, α2) yields

∂2 log g(n,m)∂θi∂θj

=∂2 log f(x, y)

∂θi∂θj

we can see that f(x, y) and g(n,m) have the same Fisher metric.Finally, the identity map on the parameter space

(α1, c, α2) ∈ R+ × R

+ × R+

determines an isometry of Riemannian manifolds.

4.8 Generalized McKay 5-Manifold

Here we introduce a bivariate gamma distribution which is a slight general-ization of that due to McKay, by substituting (x− γ1) for x and (y − γ2) fory in equation (4.7). We call this a bivariate 3-parameter gamma distribution,because the marginal functions are univariate 3-parameter gamma densityfunctions, the extra parameters γi simply shifting the density function in thespace of random variables. Then we consider the bivariate 3-parameter gammafamily as a Riemannian 5-manifold. The geometrical objects have been calcu-lated [13] but are not listed here because their expressions are lengthy.

4.8.1 Bivariate 3-Parameter Gamma Densities

Proposition 4.22. Let X and Y be continuous random variables, then

f(x, y) =c(α1+α2)(x− γ1)α1−1(y − γ2 − x + γ1)α2−1e−c(y−γ2)

Γ (α1)Γ (α2)(4.57)

defined on (y − γ2) > (x − γ1) > 0 , α1, c, α2 > 0, γ1, γ2 ≥ 0, is a densityfunction. The covariance and marginal density functions, of X and Y aregiven by:

Page 83: Information Geometry: Near Randomness and Near Independence

4.8 Generalized McKay 5-Manifold 73

σ12 =α1

c2(4.58)

fX(x) =cα1(x− γ1)α1−1e−c(x−γ1)

Γ (α1), x > γ1 ≥ 0 (4.59)

fY (y) =c(α1+α2)(y − γ2)(α1+α2)−1e−c(y−γ2)

Γ (α1 + α2), y > γ2 ≥ 0 . (4.60)

Note that the marginal functions fX and fY are univariate 3-parametergamma density functions with parameters (c, α1, γ1) and (c, α1+α2, γ2), whereγ1 and γ2 are location parameters. When γ1, γ2 = 0 we recover the McKaydistribution (4.7), and when α1 = 1 the marginal function fX is exponentialdistribution with two parameters (c, γ1), while we obtain exponential marginalfunction fY with two parameters (c, γ2) when α1 + α2 = 1.

4.8.2 Generalized McKay Information Metric

Proposition 4.23. We define the set of bivariate 3-parameter gamma densityfunctions as

M∗ = f |f(x, y) =c(α1+α2)(x− γ1)α1−1(y − γ2 − x + γ1)α2−1e−c(y−γ2)

Γ (α1)Γ (α2),

(y − γ2) > (x− γ1) > 0, α1, α2 > 2, c > 0, γ1, γ2 ≥ 0. (4.61)

Then we have:

1. Identifying (α1, c, α2, γ1, γ2) as a local coordinate system, M∗ can be re-garded as a 5-manifold.

2. M∗ is a Riemannian manifold with Fisher information matrix G = [gij ]where

gij =∫ ∞

γ1

∫ ∞

x−γ1+γ2

∂2 log f(x, y)∂ξi∂ξj

f(x, y) dy dx

and (ξ1, ξ2, ξ3, ξ4, ξ5) = (α1, c, α2, γ1, γ2).is given by:

[gij ] =

ψ′(α1) − 1c 0 c

α1−1 0−1

cα1+α2

c2 − 1c 0 −1

0 − 1c ψ′(α2) c

1−α2

cα2−1

c−1+α1

0 c1−α2

c2(

1α1−2 + 1

α2−2

)

c2

2−α2

0 −1 cα2−2

c2

α2−2c2

α2−2

(4.62)

where ψ(αi) = Γ ′(αi)Γ (αi)

(i = 1, 2).

Page 84: Information Geometry: Near Randomness and Near Independence

74 4 Information Geometry of Bivariate Families

4.9 Freund Bivariate Exponential 4-Manifold F

The results in this section were computed in [13] and first reported in [15]. TheFreund [89] bivariate exponential mixture distribution arises from reliabilitymodels. Consider an instrument that has two components A, B with lifetimesX, Y respectively having density functions (when both components are inoperation)

fX(x) = α1 e−α1x

fY (y) = α2 e−α2y

for (α1, α2 > 0;x, y > 0). Then X and Y are dependent by a failure ofeither component changing the parameter of the life distribution of the othercomponent. Explicitly, when A fails, the parameter for Y becomes β2; when Bfails, the parameter for X becomes β1. There is no other dependence. Hencethe joint density function of X and Y is

f(x, y) =

α1β2e−β2y−(α1+α2−β2)x for 0 < x < y,

α2β1e−β1x−(α1+α2−β1)y for 0 < y < x

(4.63)

where αi, βi > 0 (i = 1, 2).For α1 + α2 = β1, the marginal density function of X is

fX(x) =(

α2

α1 + α2 − β1

)

β1 e−β1x

+(

α1 − β1

α1 + α2 − β1

)

(α1 + α2) e−(α1+α2)x , x ≥ 0 (4.64)

and for α1 + α2 = β2, The marginal density function of Y is

fY (y) =(

α1

α1 + α2 − β2

)

β2 e−β2y

+(

α2 − β2

α1 + α2 − β2

)

(α1 + α2) e−(α1+α2)y , y ≥ 0 (4.65)

We can see that the marginal density functions are not actually exponentialbut mixtures of exponential density functions if αi > βi, that is, they areweighted averages. So these are bivariate mixture exponential density func-tions. The marginal density functions fX(x) and fY (y) are exponential densityfunctions only in the special case αi = βi (i = 1, 2).

For the special case when α1 + α2 = β1 = β2, Freund obtained the jointdensity function as:

f(x, y) =

α1(α1 + α2)e−(α1+α2)y for 0 < x < y,α2(α1 + α2)e−(α1+α2)x for 0 < y < x

(4.66)

with marginal density functions, §1.3:

Page 85: Information Geometry: Near Randomness and Near Independence

4.9 Freund Bivariate Exponential 4-Manifold F 75

fX(x) = (α1 + α2(α1 + α2)x) e−(α1+α2)x x ≥ 0 , (4.67)fY (y) = (α2 + α1(α1 + α2)y) e−(α1+α2)y y ≥ 0 (4.68)

The covariance, §1.3 and correlation coefficient, §1.3 of X and Y are

Cov(X,Y ) =β1 β2 − α1 α2

β1 β2 (α1 + α2)2 , (4.69)

ρ(X,Y ) =β1 β2 − α1 α2

α22 + 2α1 α2 + β1

2√

α12 + 2α1 α2 + β2

2(4.70)

Note that − 13 < ρ(X,Y ) < 1. The correlation coefficient ρ(X,Y ) → 1 when

β1, β2 → ∞, and ρ(X,Y ) → − 13 when α1 = α2 and β1, β2 → 0. In many

applications, βi > αi (i = 1, 2) ( i.e., lifetime tends to be shorter when theother component is out of action); in such cases the correlation is positive.

4.9.1 Freund Fisher Metric

As in the case of the McKay manifold, the multiplicative group action of thereal numbers as a scale-changing process influences the Riemannian geometricproperties.

Proposition 4.24. F , the set of Freund bivariate mixture exponential densityfunctions, (4.63) becomes a 4-manifold with Fisher information metric, §2.0.5§3.1,

gij =∫ ∞

0

∫ ∞

0

∂2 log f(x, y)∂xi∂xj

f(x, y) dx dy

and (x1, x2, x3, x4) = (α1, β1, α2, β2).is given by

[gij ] =∫ ∞

0

∫ ∞

0

∂2 log f(x, y)∂xi∂xj

f(x, y) dx dy

=

1α12+α1 α2

0 0 00 α2

β12 (α1+α2)

0 00 0 1

α22+α1 α20

0 0 0 α1β2

2 (α1+α2)

(4.71)

The inverse matrix [gij ] = [gij ]−1 is given by

[gij ] =

α12 + α1 α2 0 0 0

0 β12 (α1+α2)

α20 0

0 0 α22 + α1 α2 0

0 0 0 β22 (α1+α2)

α1

. (4.72)

The orthogonality property of (α1, β1, α2, β2) coordinates is equivalent to as-ymptotic independence of the maximum likelihood estimates[20, 113, 153].

Page 86: Information Geometry: Near Randomness and Near Independence

76 4 Information Geometry of Bivariate Families

4.10 Freund Natural Coordinates

Leurgans, Tsai, and Crowley [135] pointed out that the Freund density func-tions form an exponential family, §3.2, with natural parameters

(θ1, θ2, θ3, θ4) = (α1 + β1, α2, log(

α1 β2

α2 β1

)

, β2) (4.73)

and potential function

ϕ(θ) = − log(θ1 θ2 θ4

eθ3 θ2 + θ4) = − log(α2 β1). (4.74)

Hence

θ1 = α1 + β1, θ2 = α2, θ3 = log(

α1 β2

α2 β1

)

, θ4 = β2

yield:

α1 =θ1 θ2

eθ3 θ2 + θ4eθ3 , β1 =

θ1 θ4eθ3 θ2 + θ4

, α2 = θ2, β2 = θ4.

so (4.63) in natural coordinates is

f(x, y) =

θ1 θ2 θ4eθ3 θ2+θ4

eθ3e−θ4y−(θ1−θ4)x for 0 < x < yθ1 θ2 θ4

eθ3 θ2+θ4e−θ2x−(θ1−θ2)y for 0 < y < x

=

eθ1(−x)+θ3+θ4(x−y)+log(

θ1 θ2 θ4eθ3 θ2+θ4

)for 0 < x < y

eθ1(y)+θ2(y−x)+log(

θ1 θ2 θ4eθ3 θ2+θ4

)for 0 < y < x

. (4.75)

In natural coordinates (θi) (4.73) the Fisher metric is given by

[

∂2ϕ(θ)∂θi∂θj

]

=

1θ1

2 0 0 0

0θ4 (2 eθ3 θ2+θ4)θ2

2 (eθ3 θ2+θ4)2eθ3 θ4

(eθ3 θ2+θ4)2 − eθ3

(eθ3 θ2+θ4)2

0 eθ3 θ4

(eθ3 θ2+θ4)2eθ3 θ2 θ4

(eθ3 θ2+θ4)2 − eθ3 θ2

(eθ3 θ2+θ4)2 ,

0 − eθ3

(eθ3 θ2+θ4)2 − eθ3 θ2

(eθ3 θ2+θ4)21

θ42 − 1

(eθ3 θ2+θ4)2

=

1(α1+β1)

2 0 0 0

0 β1 (2 α1+β1)

α22 (α1+β1)2

α1 β1α2 (α1+β1)

2 − α1 β1α2 (α1+β1)

2 β2

0 α1 β1α2 (α1+β1)

2α1 β1

(α1+β1)2 − α1 β1

(α1+β1)2 β2

0 − α1 β1α2 (α1+β1)

2 β2− α1 β1

(α1+β1)2 β2

α1 (α1+2 β1)

(α1+β1)2 β2

2

.

(4.76)

Page 87: Information Geometry: Near Randomness and Near Independence

4.11 Freund a-Geometry 77

4.11 Freund a-Geometry

The α-connection §3.3, α-curvature tensor, α-Ricci tensor with its eigenvaluesand eigenvectors, α-scalar curvature, α-sectional curvatures and the α-meancurvatures are more simply reported with respect to the coordinate system(α1, β1, α2, β2).

4.11.1 Freund a-Connection

For each α ∈ R, the α (or ∇(α))-connection is the torsion-free affine connectionwith components

Γ(α)ij,k =

∫ ∞

0

∫ ∞

x

(

∂2 log f1

∂ξi∂ξj

∂ log f1

∂ξk+

1 − α

2∂ log f1

∂ξi

∂ log f1

∂ξj

∂ log f1

∂ξk

)

f1 dy dx

+∫ ∞

0

∫ ∞

x

(

∂2 log f2

∂ξi∂ξj

∂ log f2

∂ξk+

1 − α

2∂ log f2

∂ξi

∂ log f2

∂ξj

∂ log f2

∂ξk

)

f2 dy dx

Proposition 4.25. The nonzero independent components Γ(α)ij,k are

Γ(α)11,1 =

2 (α− 1) α1 − (1 + α) α2

2α12 (α1 + α2)

2 ,

Γ(α)11,3 =

1 + α

2α1 (α1 + α2)2 ,

Γ(α)12,2 =

(α− 1) α2

2 (α1 + α2)2β1

2,

Γ(α)13,3 =

α− 12α2 (α1 + α2)

2 ,

Γ(α)14,4 =

− (α− 1) α2

2 (α1 + α2)2β2

2,

Γ(α)22,2 =

(α− 1) α2

(α1 + α2) β13 ,

Γ(α)22,3 =

− (1 + α) α1

2 (α1 + α2)2β1

2,

Γ(α)33,3 =

− (1 + α) α1 + 2 (α− 1) α2

2α22 (α1 + α2)

2 ,

Γ(α)34,4 =

(α− 1) α1

2 (α1 + α2)2β2

2,

Γ(α)44,4 =

(α− 1) α1

(α1 + α2) β23 . (4.77)

Page 88: Information Geometry: Near Randomness and Near Independence

78 4 Information Geometry of Bivariate Families

We have a symmetric linear connection ∇(α) defined by:

〈∇(α)∂i

∂j , ∂k〉 = Γ(α)ij,k ,

Hence from

Γ(α)ij,k =

4∑

h=1

gkh Γh(α)ij , (k = 1, 2, 3, 4).

we obtain the components of ∇(α).

Proposition 4.26. The nonzero components Γi(α)jk of the ∇(α)-connections

are given by:

Γ(α)111 = −1 + α

2α1+

−1 + 3α2 (α1 + α2)

,

Γ(α)113 = Γ

(α)212 = Γ

(α)313 = Γ

(α)434 =

α− 12 (α1 + α2)

,

Γ(α)122 = −Γ (α)3

22 =(1 + α) α1 α2

2 (α1 + α2) β12 ,

Γ(α)133 = Γ

(α)223 =

(1 + α) α1

2α2 (α1 + α2),

Γ(α)144 = −Γ (α)3

44 =− (1 + α) α1 α2

2 (α1 + α2) β22 ,

Γ(α)311 = Γ

(α)414 =

(1 + α) α2

2α1 (α1 + α2),

Γ(α)333 = −1 + α

2α2+

−1 + 3α2 (α1 + α2)

,

Γ(α)444 =

α− 1β2

. (4.78)

4.11.2 Freund a-Curvatures

Proposition 4.27. The nonzero independent components R(α)ijkl of the

α-curvature tensor are given by:

R(α)1212 =

(

α2 − 1)

α22

4α1 (α1 + α2)3β1

2,

R(α)1223 =

(

α2 − 1)

α2

4 (α1 + α2)3β1

2,

R(α)1414 =

(

α2 − 1)

α2

4 (α1 + α2)3β2

2,

Page 89: Information Geometry: Near Randomness and Near Independence

4.11 Freund a-Geometry 79

R(α)1434 =

−(

α2 − 1)

α1

4 (α1 + α2)3β2

2,

R(α)2323 =

(

α2 − 1)

α1

4 (α1 + α2)3β1

2,

R(α)2424 =

(

α2 − 1)

α1 α2

4 (α1 + α2)2β1

2 β22,

R(α)3434 =

(

α2 − 1)

α12

4α2 (α1 + α2)3β2

2. (4.79)

Contracting R(α)ijkl with gil we obtain the components R

(α)jk of the α-Ricci

tensor.

Proposition 4.28. The α-Ricci tensor R(α) = [R(α)jk ] is given by:

R(α) = [R(α)jk ] =

−(α2−1)α2

2 α1 (α1+α2)2 0 α2−1

2 (α1+α2)2 0

0−(α2−1)α2

2 (α1+α2) β12 0 0

α2−12 (α1+α2)

2 0−(α2−1)α1

2 α2 (α1+α2)2 0

0 0 0−(α2−1)α1

2 (α1+α2) β22

(4.80)

The α-eigenvalues and the α-eigenvectors of the α-Ricci tensor are:

(

α2 − 1)

01

(α1+α2)2 − 1

2 α1 α2−α2

2 (α1+α2) β12

−α12 (α1+α2) β2

2

(4.81)

α1α2

0 1 0−α2

α10 1 0

0 1 0 00 0 0 1

(4.82)

Proposition 4.29. The Freund manifold F has a constant α-scalar curvature

R(α) =32(

1 − α2)

. (4.83)

The Freund manifold has a positive scalar curvature R(0) = 32 when α = 0, so

geometrically it constitutes part of a sphere.

Page 90: Information Geometry: Near Randomness and Near Independence

80 4 Information Geometry of Bivariate Families

Proposition 4.30. The α-sectional curvatures (α)(λ, µ) (λ, µ = 1, 2, 3, 4) aregiven by:

(α)(1, 2) = (α)(1, 4) =

(

1 − α2)

α2

4 (α1 + α2),

(α)(1, 3) = 0,

(α)(2, 3) =

(

1 − α2)

α1

4 (α1 + α2),

(α)(2, 4) =1 − α2

4,

(α)(3, 4) = (2, 3) . (4.84)

Proposition 4.31. The α-mean curvatures (α)(λ) (λ = 1, 2, 3, 4) aregiven by:

(α)(1) =

(

1 − α2)

α2

6 (α1 + α2),

(α)(2) = (4) =1 − α2

6,

(α)(3) =

(

1 − α2)

α1

6 (α1 + α2). (4.85)

4.12 Freund Foliations

Since F is an exponential family, §3.2 a mutually dual coordinate system isgiven by the potential function ϕ(θ) (4.74), §3.2, that is

η1 =∂ϕ(θ)∂θ1

= − 1θ1

= − 1α1 + β1

,

η2 =∂ϕ(θ)∂θ2

= − θ4θ2 (eθ3 θ2 + θ4)

= − β1

α2 (α1 + β1),

η3 =∂ϕ(θ)∂θ3

= 1 − θ4eθ3 θ2 + θ4

=α1

α1 + β1,

η4 =∂ϕ(θ)∂θ4

= − 1θ4

+1

eθ3 θ2 + θ4= − α1

(α1 + β1) β2. (4.86)

Then (θ1, θ2, θ3, θ4) is a 1-affine coordinate system, (η1, η2, η3, η4) is a (−1)-affine coordinate system, and they are mutually dual with respect to the Fisherinformation metric. The (ηi) have a potential function given by:

λ = log(

θ1 θ2 θ4eθ3 θ2 + θ4

)

+eθ3 θ2 θ3eθ3 θ2 + θ4

− 2

= log(α2 β1) +α1

α1 + β1log(

α1 β2

α2 β1

)

− 2. (4.87)

Page 91: Information Geometry: Near Randomness and Near Independence

4.13 Freund Submanifolds 81

Coordinates (θi) and (ηi) are mutually dual so F has dually orthogonalfoliations. With coordinates

(η1, θ2, θ3, θ4) = (− 1θ1, θ2, θ3, θ4)

the Freund density functions are:

f(x, y; η1, θ2, θ3, θ4) =

− θ2 θ4 eθ3

(θ2 eθ3+θ4) η1eθ4 (x−y)+ x

η1 for 0 < x < y

− θ2 θ4

(θ2 eθ3 +θ4) η1eθ2 (y−x)+ y

η1 for 0 < y < x(4.88)

where η1 < 0 and θi > 0 (i = 2, 3, 4).The Fisher metric is

[gij ] =

1η12 0 0 0

0θ4 (2 eθ3 θ2+θ4)θ2

2 (eθ3 θ2+θ4)2eθ3 θ4

(eθ3 θ2+θ4)2 − eθ3

(eθ3 θ2+θ4)2

0 eθ3 θ4

(eθ3 θ2+θ4)2eθ3 θ2 θ4

(eθ3 θ2+θ4)2 − eθ3 θ2

(eθ3 θ2+θ4)2

0 − eθ3

(eθ3 θ2+θ4)2 − eθ3 θ2

(eθ3 θ2+θ4)21

θ42 − 1

(eθ3 θ2+θ4)2

=

(α1 + β1)2 0 0 0

0 β1 (2 α1+β1)

α22 (α1+β1)2

α1 β1α2 (α1+β1)

2 − α1 β1α2 (α1+β1)

2 β2

0 α1 β1α2 (α1+β1)

2α1 β1

(α1+β1)2 − α1 β1

(α1+β1)2 β2

0 − α1 β1α2 (α1+β1)

2 β2− α1 β1

(α1+β1)2 β2

α1 (α1+2 β1)

(α1+β1)2 β2

2

.

(4.89)

It follows that (θi) is a geodesic coordinate system of ∇(1), and (ηi) is ageodesic coordinate system of ∇(−1).

4.13 Freund Submanifolds

Four submanifolds Fi (i = 1, 2, 3, 4), §2.0.5, of the 4-manifold F are of interest,including the case of statistically independent random variables, §1.3. Also oneis the special case of an Absolutely Continuous Bivariate Exponential Distri-bution called ACBED (or ACBVE) by Block and Basu (cf. Hutchinson andLai [105]). We use the coordinate system (α1, β1, α2, β2) for the submanifoldsFi (i = 3), and the coordinate system (λ1, λ12, λ2) for ACBED of the Blockand Basu case.

4.13.1 Independence Submanifold F1

This is defined asF1 ⊂ F : β1 = α1, β2 = α2.

Page 92: Information Geometry: Near Randomness and Near Independence

82 4 Information Geometry of Bivariate Families

The density functions are of form:

f(x, y;α1, α2) = f1(x;α1)f2(y;α2) (4.90)

where fi are the density functions of the univariate exponential density func-tions with the parameters αi > 0 (i = 1, 2). This is the case for statistical in-dependence, §1.3, of X and Y , so the space F1 is the direct product of the twocorresponding Riemannian spaces f1(x;α1) : f1(x;α1) = α1e

−α1x, α1 > 0and f2(y;α2) : f2(y;α2) = −α2e

−α2y, α2 > 0.

Proposition 4.32. The metric tensor [gij ] is as follows:

[gij ] =

[

1α2

10

0 1α2

2

]

. (4.91)

Proposition 4.33. The nonzero independent components of the α-connectionare

Γ(α)11,1 =

α− 1α1

3,

Γ(α)22,2 =

α− 1α2

3,

Γ(α)111 =

α− 1α1

,

Γ(α)222 =

α− 1α2

. (4.92)

Proposition 4.34. The α-curvature tensor, α-Ricci tensor, and α-scalar cur-vature of F1 are zero.

4.13.2 Submanifold F2

This is defined asF2 ⊂ F : α1 = α2, β1 = β2.

The density functions are of form:

f(x, y;α1, β1) =

α1β1 e−β1y−(2 α1−β1)x for 0 < x < y

α1β1 e−β1x−(2 α1−β1)y for 0 < y < x

(4.93)

with parameters α1, β1 > 0. The covariance, correlation coefficient and mar-ginal density functions, of X and Y are given by:

Cov(X,Y ) =14

(

1α1

2− 1

β12

)

, (4.94)

Page 93: Information Geometry: Near Randomness and Near Independence

4.13 Freund Submanifolds 83

ρ(X,Y ) = 1 − 4α12

3α12 + β1

2 , (4.95)

fX(x) =(

α1

2α1 − β1

)

β1 e−β1x +

(

α1 − β1

2α1 − β1

)

(2α1) e−2 α1x , x ≥ 0 ,

(4.96)

fY (y) =(

α1

2α1 − β1

)

β1 e−β1y +

(

α1 − β1

2α1 − β1

)

(2α1) e−2 α1y , y ≥ 0 .

(4.97)

When α1 = β1, ρ(X,Y ) = 0. Submanifold F2 forms an exponential family,with natural parameters (α1, β1) and potential function

ϕ = − log(α1 β1). (4.98)

Proposition 4.35. The submanifold F2 is an isometric isomorph of the man-ifold F1 §3.1.

Proof. Since ϕ = − log(α1 β1) is a potential function, the Fisher metric is theHessian of ϕ, that is,

[gij ] = [∂2ϕ

∂θi∂θj] =

[

1α2

10

0 1β21

]

(4.99)

where (θ1, θ2) = (α1, β1) .

4.13.3 Submanifold F3

This is defined asF3 ⊂ F : β1 = β2 = α1 + α2.

The density functions are of form:

f(x, y;α1, α2, β2) =

α1 (α1 + α2) e−(α1+α2)y for 0 < x < yα2 (α1 + α2) e−(α1+α2)x for 0 < y < x

(4.100)

with parameters α1, α2 > 0. The covariance, correlation coefficient and mar-ginal functions, of X and Y are given by:

Cov(X,Y ) =α1

2 + α1 α2 + α22

(α1 + α2)4 , (4.101)

ρ(X,Y ) =α1

2 + α1 α2 + α22

2 (α1 + α2)2 − α12√

2α12 + 4α1α2 + α2

2, (4.102)

fX(x) = (α2 (α1 + α2)x + α1) e−(α1+α2)x, x ≥ 0 (4.103)fY (y) = (α1 (α1 + α2)y + α2) e−(α1+α2)y, y ≥ 0 (4.104)

Note that the correlation coefficient is positive.

Page 94: Information Geometry: Near Randomness and Near Independence

84 4 Information Geometry of Bivariate Families

Proposition 4.36. The metric tensor on F3 is

[gij ] =

[

α2+2 α1α1 (α1+α2)

21

(α1+α2)2

1(α1+α2)

2α1+2 α2

α2 (α1+α2)2

]

. (4.105)

Proposition 4.37. The nonzero independent components of the α-connectionof F3 are

Γ(α)111 = −1 + α

2α1+

−1 + 3α2 (α1 + α2)

,

Γ(α)112 =

−1 + α

2 (α1 + α2),

Γ(α)122 =

(1 + α) α1

2α2 (α1 + α2),

Γ(α)211 =

(1 + α) α2

2α1 (α1 + α2),

Γ(α)222 = −1 + α

2α2+

−1 + 3α2 (α1 + α2)

. (4.106)

Proposition 4.38. The α-curvature tensor, α-Ricci curvature, and α-scalarcurvature of F3 are zero.

4.13.4 Submanifold F4

This is F4 ⊂ F, ACBED of Block and Basu with the density functions:

f(x, y;λ1, λ12, λ2) =

λ1 λ (λ2+λ12)λ1+λ2

e−λ1 x−(λ2+λ12) y for 0 < x < yλ2 λ (λ1+λ12)

λ1+λ2e−(λ1+λ12) x−λ2 y for 0 < y < x

(4.107)

where the parameters λ1, λ12, λ2 are positive, and λ = λ1 + λ2 + λ12.This distribution originated by omitting the singular part of the Marshall

and Olkin distribution (cf. [122], page [139]); Block and Basu called it theACBED to emphasize that they are the absolutely continuous bivariate expo-nential density functions. Alternatively, it can be derived by Freund’s method(4.63), with

α1 = λ1 +λ1 λ12

(λ1 + λ2),

β1 = λ1 + λ12,

α2 = λ2 +λ2 λ12

(λ1 + λ2),

β2 = λ2 + λ12 .

By substitution we obtain the covariance, correlation coefficient and marginalfunctions:

Page 95: Information Geometry: Near Randomness and Near Independence

4.13 Freund Submanifolds 85

Cov(X,Y ) =(λ1 + λ2)

2 (λ1 + λ12) (λ2 + λ12) − λ2 λ1 λ2

λ2 (λ1 + λ2)2 (λ1 + λ12) (λ2 + λ12)

, (4.108)

ρ(X,Y ) =(λ1 + λ2)

2 (λ1 + λ12) (λ2 + λ12) − λ2 λ1 λ2√

∏2i=1, j =i

(

(λ1 + λ2)2(λi + λ12)

2 + λjλ2 (λj + 2λi))

,

(4.109)

fX(x) =(

−λ12

λ1 + λ2

)

λ e−λ x

+(

λ

λ1 + λ2

)

(λ1 + λ12) e−(λ1+λ12) x, x ≥ 0 (4.110)

fY (y) =(

−λ12

λ1 + λ2

)

λ e−λ y

+(

λ

λ1 + λ2

)

(λ2 + λ12) e−(λ2+λ12) y, y ≥ 0 (4.111)

The correlation coefficient is positive, and the marginal functions are a nega-tive mixture of two exponentials.

Proposition 4.39. The metric tensor in the coordinate system (λ1, λ12, λ2)is [gij ] =⎡

λ2

(

1λ1

+λ1+λ2

(λ1+λ12)2

)

(λ1+λ2)2 + 1

λ2λ2

(λ1+λ2) (λ1+λ12)2 + 1

λ2−1

(λ1+λ2)2 + 1

λ2

λ2(λ1+λ2) (λ1+λ12)

2 + 1λ2

λ2(λ1+λ12)2

+λ1

(λ2+λ12)2

λ1+λ2+ 1

λ2λ1

(λ1+λ2) (λ2+λ12)2 + 1

λ2

−1(λ1+λ2)

2 + 1λ2

λ1(λ1+λ2) (λ2+λ12)

2 + 1λ2

λ1

(

1λ2

+λ1+λ2

(λ2+λ12)2

)

(λ1+λ2)2 + 1

λ2

.

(4.112)

The α-connections and the α-curvatures were computed [13] but are notlisted because they have lengthy expressions. When λ1 = λ2, this family ofdensity functions becomes

f(x, y;λ1, λ12) =

(2 λ1+λ12) (λ1+λ12)2 e−λ1 x−(λ1+λ12) y, 0 < x < y

(2 λ1+λ12) (λ1+λ12)2 e−λ1 y−(λ1+λ12) x 0 < y < x

. (4.113)

That is an exponential family with natural parameters (θ1, θ2) = (λ1, λ12)and potential function

ϕ(θ) = log(2) − log(λ1 + λ12) − log(2λ1 + λ12).

From equations (4.110, 4.111), this family of density functions has identicalmarginal density functions.

Page 96: Information Geometry: Near Randomness and Near Independence

86 4 Information Geometry of Bivariate Families

The metric tensor [gij ] = [ ∂2ϕ∂θi∂θj

] is

[gij ] =

[

1(λ1+λ12)

2 + 4(2 λ1+λ12)

21

(λ1+λ12)2 + 2

(2 λ1+λ12)2

1(λ1+λ12)

2 + 2(2 λ1+λ12)

21

(λ1+λ12)2 + 1

(2 λ1+λ12)2

]

. (4.114)

Then using

Γ(α)ij,k =

1 − α

2∂3ϕ

∂θi∂θj∂θk

Γ(α)11,1 = (1 − α)

(

−1(λ1 + λ12)

3 − 8(2λ1 + λ12)

3

)

,

Γ(α)11,2 = (1 − α)

(

−1(λ1 + λ12)

3 − 4(2λ1 + λ12)

3

)

,

Γ(α)12,2 = (1 − α)

(

−1(λ1 + λ12)

3 − 2(2λ1 + λ12)

3

)

,

Γ(α)22,2 = (1 − α)

(

−1(λ1 + λ12)

3 − 1(2λ1 + λ12)

3

)

. (4.115)

From

Γ(α)ij,k =

3∑

h=1

gkh Γh(α)ij , (k = 1, 2)

the components of ∇(α) are

Γ (α)1 = [Γ (α)1ij ] =

[

1−αλ1+λ12

+ 4 (α−1)2 λ1+λ12

(α−1) λ12(λ1+λ12) (2 λ1+λ12)

(α−1) λ12(λ1+λ12) (2 λ1+λ12)

−(α−1) λ1(λ1+λ12) (2 λ1+λ12)

]

,

Γ (α)2 = [Γ (α)2ij ] =

[ −2 (α−1) λ12(λ1+λ12) (2 λ1+λ12)

2 (α−1) λ1(λ1+λ12) (2 λ1+λ12)

2 (α−1) λ1(λ1+λ12) (2 λ1+λ12)

2 (α−1)λ1+λ12

+ 1−α2 λ1+λ12

]

. (4.116)

The α-curvature tensor, α-Ricci curvature, and α-scalar curvature are zero.Since (λ1, λ12) is a 1-affine coordinate system, then there is a (-1)-affine coor-dinate system

(η1, η2) = (− 1λ1 + λ12

− 1λ1 + 2λ12

,− 1λ1 + λ12

− 12λ1 + λ12

)

with potential function

λ = −2 − log(2) + log(2λ1 + λ12) + log(λ1 + λ12).

Page 97: Information Geometry: Near Randomness and Near Independence

4.15 Freund Bivariate Log-Exponential Manifold 87

4.14 Freund Affine Immersion

Proposition 4.40. Let F be the Freund 4-manifold with the Fisher metricg and the exponential connection ∇(1). Denote by (θi) the natural coordinatesystem (4.73). Then, by §3.4, F can be realized in R

5 by the graph of a potentialfunction, the affine immersion f, ξ:

f : F → R5 :[

θi

]

→[

θi

ϕ(θ)

]

, ξ =

00001

, (4.117)

where ϕ(θ) is the potential function

ϕ(θ) = − log(θ1 θ2 θ4

eθ3 θ2 + θ4) = − log(α2 β1).

4.15 Freund Bivariate Log-Exponential Manifold

The Freund bivariate mixture log-exponential density functions arise fromthe Freund density functions (4.63)for the non-negative random variables x =log 1

n and y = log 1m , or equivalently, n = e−x and m = e−y.

So the Freund log-exponential density functions are given by:

g(n,m) =

α1 β2 m(β2−1) n(α1+α2−β2−1) for 0 < m < n < 1,

α2 β1 n(β1−1) m(α1+α2−β1−1) for 0 < n < m < 1

(4.118)

where αi, βi > 0 (i = 1, 2). The covariance, and marginal density functions,of n and m are given by:

Corollary 4.41.

Cov(n,m) =α2 (− (α1 (2 + α1 + α2)) + β1) + (α1 + (α1 + α2) β1) β2

(1 + α1 + α2)2 (2 + α1 + α2) (1 + β1) (1 + β2)

,

gN (n) =(

α2

α1 + α2 − β1

)

β1nβ1−1

+(

α1 − β1

α1 + α2 − β1

)

(α1 + α2)n(α1+α2)−1 ,

gM (m) =(

α1

α1 + α2 − β2

)

β2mβ2−1

+(

α2 − β2

α1 + α2 − β2

)

(α1 + α2)m(α1+α2)−1. (4.119)

Page 98: Information Geometry: Near Randomness and Near Independence

88 4 Information Geometry of Bivariate Families

The variables n and m are independent if and only if αi = βi (i = 1, 2),and the marginal functions are mixture log-exponential density functions.

Directly from the definition of the Fisher metric we deduce:

Proposition 4.42. The family of Freund bivariate mixture log-exponentialdensity functions for random variables n,m determines a Riemannian4-manifold which is an isometric isomorph of the Freund 4-manifold §3.1.

4.16 Bivariate Gaussian 5-Manifold N

The results in this section were computed in [13] and first reported in [15].The bivariate Gaussian distribution was considered occasionally as early asthe middle of the nineteenth century, and has played a predominant role inthe historical development of statistical theory, so it has made its appearancein various areas of application. The differential geometrical consideration ofthe parameter space of the bivariate Gaussian family of density functionswas considered by Sato et al. [183], who provided the density functions asa Riemannian 5-manifold. They calculated the α = 0 geometry, i.e., the0-connections, 0-Ricci curvature, the 0-scalar curvature etc, and they showedthat the bivariate Gaussian manifold has a negative constant 0-scalar curva-ture and that, if the correlation coefficient vanishes, the space becomes anEinstein space.

We extend these studies by calculating the α-connections, the α-Ricci ten-sor, the α-scalar curvatures etc and we show that this manifold has a constantα-scalar curvature, so geometrically it constitutes part of a pseudosphere whenα2 < 1. We derive mixture coordinates and mutually dual foliations, then weconsider particular submanifolds, including the case of statistically indepen-dent random variables, and discuss their geometrical structure. We considerbivariate log-Gaussian (log-normal) density functions, with log-Gaussian mar-ginal functions. We show that this family of density functions determines aRiemannian 5-manifold, an isometric isomorph of the bivariate Gaussian man-ifold, §3.1.

The probability density function with real random variables x1, x2 and fiveparameters has the form:

f(x, y) =1

2π√σ1 σ2 − σ12

2e−AB, (4.120)

with A =1

2 (σ1 σ2 − σ122)

B =(

σ2(x− µ1)2 − 2 σ12 (x− µ1) (y − µ2) + σ1(y − µ2)

2)

defined on −∞ < x, y < ∞ with parameters (µ1, µ2, σ1, σ12, σ2); where−∞ < µ1, µ2 < ∞, 0 < σ1 , σ2 < ∞ and σ12 is the covariance of X and Y.

Page 99: Information Geometry: Near Randomness and Near Independence

4.17 Bivariate Gaussian Fisher Information Metric 89

The bivariate Gaussian distribution with central means; µ1 = µ2 = 0 andunit variances; σ1 = σ2 = 1, that is

f(x, y) =1

2π√

1 − σ122e− 1

2 (1−σ122) (x2−2 σ12 x y+y2), (4.121)

is called the standardized distribution.

The marginal functions, §1.3, of X and Y are univariate Gaussian densityfunctions NX(µ1, σ2) and NY (µ2, σ2), §1.2.3:

fX(x) = NX(µ1, σ1) =1√

2π σ1e−

(x−µ1)2

2 σ1 , (4.122)

fY (y) = NY (µ2, σ2) =1√

2π σ2e−

(y−µ2)2

2 σ2 . (4.123)

The correlation coefficient is:

ρ(X,Y ) =σ12√σ1 σ2

Since σ122 = σ1 σ2 then −1 < ρ(X,Y ) < 1; so we do not have the case

when Y is a linearly increasing (or decreasing) function of X.

4.17 Bivariate Gaussian Fisher Information Metric

The information geometry of (4.120) has been studied by Sato et al. [183]; themetric tensor, §3.1, takes the following form:

G = [gij ] =

σ2 −σ12

0 0 0−σ12

σ1 0 0 0

0 0 (σ2)2

22 −σ12 σ22

(σ12)2

22

0 0 −σ12 σ22

σ1 σ2+(σ12)2

2 −σ1 σ122

0 0 (σ12)2

22 −σ1 σ122

(σ1)2

22

, (4.124)

where is the determinant

= σ1 σ2 − (σ12)2

The inverse [gij ] of the metric tensor [gij ] defined by the relation

gijgik = δk

j

is given by

G−1 = [gij ] =

σ1 σ12 0 0 0σ12 σ2 0 0 00 0 2 (σ1)2 2σ1 σ12 2 (σ12)2

0 0 2σ1 σ12 σ1 σ2 + (σ12)2 2σ12 σ2

0 0 2 (σ12)2 2σ12 σ2 2 (σ2)2

. (4.125)

Page 100: Information Geometry: Near Randomness and Near Independence

90 4 Information Geometry of Bivariate Families

4.18 Bivariate Gaussian Natural Coordinates

Proposition 4.43. The set of all bivariate Gaussian density functionsforms an exponential family, §3.2, with natural coordinate system, §3.3,(θ1, θ2, θ3, θ4, θ5) =

(µ1 σ2 − µ2 σ12

,µ2 σ1 − µ1 σ12

,−σ2

2 ,σ12

,−σ1

2 ) (4.126)

with corresponding potential function

ϕ(θ) = log(2π√

) +µ2

2 σ1 + µ12 σ2 − 2µ1 µ2 σ12

2= log(2π

) −(

θ22 θ3 − θ1 θ2 θ4 + θ1

2 θ5)

. (4.127)

where = σ1 σ2 − σ12

2 =1

4 θ3 θ5 − θ42 .

Proof.

log f(x, y) = log(

12π

√ e−

12 (σ2(x−µ1)

2−2 σ12 (x−µ1) (y−µ2)+σ1(y−µ2)2))

=µ1 σ2 − µ2 σ12

x

+µ2 σ1 − µ1 σ12

y +−σ2

2 x2 +σ12

x y +−σ1

2 y2 (4.128)

−(

log(2π√

) +µ2

2 σ1 + µ12 σ2 − 2µ1 µ2 σ12

2

)

. (4.129)

Hence we have an exponential family. The line (4.128) implies that

(µ1 σ2 − µ2 σ12

,µ2 σ1 − µ1 σ12

,−σ2

2 ,σ12

,−σ1

2 )

is a natural coordinate system,

(F1(x), F2(x), F3(x), F4(x), F5(x)) = (x, y, x2, x y, y2)

and (4.129) implies that

ϕ(θ) = log(2π√

) +µ2

2 σ1 + µ12 σ2 − 2µ1 µ2 σ12

2

is its potential function. In terms of natural coordinates by solving

θ1 =µ1σ2 − µ2σ12

σ1σ2 − σ122, θ2 =

µ2σ1 − µ1σ12

σ1σ2 − σ122, θ3 =

−σ2

2 (σ1σ2 − σ122),

Page 101: Information Geometry: Near Randomness and Near Independence

4.19 Bivariate Gaussian a-Geometry 91

θ4 =σ12

σ1σ2 − σ122, θ5 =

−σ1

2 (σ1σ2 − σ122)

we obtain:

µ1 =2θ1θ5 − θ2θ4

θ42 − 4θ3θ5

, µ2 =2θ2θ3 − θ1θ4

θ42 − 4θ3 θ5

, σ1 =2θ5

θ42 − 4 θ3θ5

,

σ12 =θ4

4θ3θ5 − θ42 , σ2 =

2θ3θ4

2 − 4θ3θ5.

Thenϕ = log(2π

) −(

θ22 θ3 − θ1 θ2 θ4 + θ1

2 θ5)

,

where = σ1 σ2 − σ122 =

14 θ3 θ5 − θ4

2 .

The normalization function C(X,Y ), required by the definition of the ex-ponential family property §3.2, is zero here.

4.19 Bivariate Gaussian a-Geometry

By direct calculation, §3.3 we provide the α-connections, and variousα-curvature objects of N . The analytic expressions for the α-connectionsand the α-curvature objects are very large if we use the natural coordinatesystem, so we report these components in terms of the coordinate system(µ1, µ2, σ1, σ12, σ2).

4.19.1 a-Connection

For each α ∈ R, the α (or ∇(α))-connection is the torsion-free affine connectionwith components:

Γ(α)ij,k(ξ) =

∫ ∞

−∞

∫ ∞

−∞

(

∂2 log f∂ξi∂ξj

∂ log f∂ξk

+1 − α

2∂ log f∂ξi

∂ log f∂ξj

∂ log f∂ξk

)

f dx dy

Since it is difficult to derive the α-connection components with respect to thelocal coordinates (µ1, µ2, σ1, σ12, σ2) by using these integrations, we derivethem with respect to the natural coordinates (θi), by:

Γ(α)rs,t(θ) =

1 − α

2∂r ∂s ∂tϕ(θ) ,

where ϕ(θ) is the potential function, and ∂i = ∂∂θi

.

Then we change the coordinates to (ξi):

Γ(α)ij,k(ξ) =

(

Γ(α)rs,t(θ)

∂θr

∂ξi

∂θs

∂ξj+

∂2θt

∂ξi ∂ξj

)

∂ξk

∂θt.

Page 102: Information Geometry: Near Randomness and Near Independence

92 4 Information Geometry of Bivariate Families

Proposition 4.44. The functions Γ(α)ij,k from §3.3 are given by:

[Γ(α)ij,1]=

0 0−(1+α) σ2

2

22(1+α) σ2 σ12

2−(1+α) σ12

2

22

0 0(1+α) σ2 σ12

22−(1+α) (σ1 σ2+σ12

2)22

(1+α) σ1 σ1222

−(1+α) σ22

22(1+α) σ2 σ12

22 0 0 0

(1+α) σ2 σ122

−(1+α) (σ1 σ2+σ122)

22 0 0 0

−(1+α) σ122

22(1+α) σ1 σ12

22 0 0 0

[Γ(α)ij,2]=

0 0(1+α) σ2 σ12

22−(1+α) (σ1 σ2+σ12

2)22

(1+α) σ1 σ1222

0 0−(1+α) σ12

2

22(1+α) σ1 σ12

2−(1+α) σ1

2

22

(1+α) σ2 σ1222

−(1+α) σ122

22 0 0 0

−(1+α) (σ1 σ2+σ122)

22(1+α) σ1 σ12

2 0 0 0

(1+α) σ1 σ1222

−(1+α) σ12

22 0 0 0

[Γ(α)ij,3] =

−(α−1) σ22

22(α−1) σ2 σ12

22 0 0 0

(α−1) σ2 σ1222

−(α−1) σ122

22 0 0 0

0 0(1+α) σ2

3

−23(1+α) σ2

2 σ123

(1+α) σ2 σ122

−23

0 0(1+α) σ2

2 σ123

−(1+α) σ2 (σ1 σ2+3 σ122)

23(1+α) σ12 (σ1 σ2+σ12

2)23

0 0(1+α) σ2 σ12

2

−23(1+α) σ12 (σ1 σ2+σ12

2)23

(1+α) σ1 σ122

−23

[Γ(α)ij,4]

(α + 1)=

(α−1)σ2σ12(α+1)2

(α−1)(σ1σ2+σ122)

−2(α+1)2 0 0 0

(α−1)(σ1σ2+σ122)

−2(α+1)2(α−1)σ1σ12(α+1)2 0 0 0

0 0 σ22σ123

σ2(σ1σ2+3σ122)

−23σ12(σ1σ2+σ12

2)23

0 0σ2(σ1σ2+3σ12

2)−23

σ12(3σ1σ2+σ122)

3σ1(σ1σ2+3σ12

2)−23

0 0σ12(σ1σ2+σ12

2)23

σ1(σ1σ2+3σ122)

−23σ1

2σ123

[Γ(α)ij,5] =

−(α−1)σ122

22(α−1)σ1σ12

22 0 0 0(α−1)σ1σ12

22−(α−1)σ1

2

22 0 0 0

0 0 (1+α)σ2σ122

−23(1+α)σ12(σ1σ2+σ12

2)23

(1+α)σ1σ122

−23

0 0(1+α)σ12(σ1σ2+σ12

2)23

−(1+α)σ1(σ1σ2+3σ122)

23(1+α)σ1

2σ123

0 0 (1+α)σ1σ122

23(1+α)σ1

2σ123

(1+α)σ13

−23

.

(4.130)

Page 103: Information Geometry: Near Randomness and Near Independence

4.19 Bivariate Gaussian a-Geometry 93

For each α, we have a symmetric linear connection ∇(α) defined by:

〈∇(α)∂i

∂j , ∂k〉 = Γ(α)ij,k ,

and by solving the equations

Γ(α)ij,k =

3∑

h=1

gkh Γh(α)ij , (k = 1, 2, 3, 4, 5).

we obtain the components of ∇(α), §3.3.

Proposition 4.45. The components Γ(α)ijk of the ∇(α)-connections are

given by:

Γ (α)1 = [Γ (α)1ij ] =

0 0 (1+α) σ2−2

(1+α) σ122 0

0 0 (1+α) σ122

(1+α) σ1−2 0

(1+α) σ2−2

(1+α) σ122 0 0 0

(1+α) σ122

(1+α) σ1−2 0 0 0

0 0 0 0 0

Γ (α)2 = [Γ (α)2ij ] =

0 0 0 (1+α) σ2−2

(1+α) σ122

0 0 0 (1+α) σ122

(1+α) σ1−2

0 0 0 0 0(1+α) σ2−2

(1+α) σ122 0 0 0

(1+α) σ122

(1+α) σ1−2 0 0 0

Γ (α)3 = [Γ (α)3ij ] =

1 − α 0 0 0 00 0 0 0 00 0 (1+α) σ2

−(1+α) σ12

00 0 (1+α) σ12

(1+α) σ1

− 00 0 0 0 0

Γ (α)4 = [Γ (α)4ij ] =

0 1−α2 0 0 0

1−α2 0 0 0 00 0 0 (1+α) σ2

−2(1+α) σ12

20 0 (1+α) σ2

−2(1+α) σ12

(1+α) σ1−2

0 0 (1+α) σ122

(1+α) σ1−2 0

Γ (α)5 = [Γ (α)5ij ] =

0 0 0 0 00 1 − α 0 0 00 0 0 0 00 0 0 (1+α) σ2

−(1+α) σ12

0 0 0 (1+α) σ12

(1+α) σ1

. (4.131)

Page 104: Information Geometry: Near Randomness and Near Independence

94 4 Information Geometry of Bivariate Families

4.19.2 a-Curvatures

Proposition 4.46. The components R(α)ijkl of the α-curvature tensor are

given by:

[R(α)12kl] =

(

α2 − 1)

0 14 0 0 0

−14 0 0 0 00 0 0 −σ2

42σ1242

0 0 σ242 0 −σ1

42

0 0 −σ1242

σ142 0

[R(α)13kl] =

(

α2 − 1)

0 0 σ23

−43σ2

2 σ1223

σ2 σ122

−43

0 0 σ22 σ12

43

−σ2 (σ1 σ2+σ122)

43σ1 σ2 σ12

43

σ23

43σ2

2 σ12−43 0 0 0

σ22 σ12

−23

σ2 (σ1 σ2+σ122)

43 0 0 0σ2 σ12

2

43σ1 σ2 σ12−43 0 0 0

[R(α)14kl]

(α2 − 1)=

0 0 σ22σ12

23σ2(σ1σ2+3σ12

2)−43

σ12(σ1σ2+σ122)

43

0 0 σ2σ122

−23σ12(3σ1σ2+σ12

2)43

σ1(σ1σ2+σ122)

−43

σ22σ12

−23σ2σ12

2

23 0 0 0

σ2(σ1σ2+3σ122)

43σ12(3σ1σ2+σ12

2)−43 0 0 0

σ12(σ1σ2+σ122)

−43σ1(σ1σ2+σ12

2)43 0 0 0

[R(α)15kl] =

(

α2 − 1)

0 0 σ2 σ122

−43

σ12 (σ1 σ2+σ122)

43σ1 σ12

2

−43

0 0 σ123

43σ1 σ12

2

−23σ1

2 σ1243

σ2 σ122

43σ12

3

−43 0 0 0σ12 (σ1 σ2+σ12

2)−43

σ1 σ122

23 0 0 0σ1 σ12

2

43σ1

2 σ12−43 0 0 0

[R(α)23kl] =

(

α2 − 1)

0 0 σ22 σ12

43σ2 σ12

2

−23σ12

3

43

0 0 σ2 σ122

−43

σ12 (σ1 σ2+σ122)

43σ1 σ12

2

−43

σ22 σ12

−43σ2 σ12

2

43 0 0 0σ2 σ12

2

23

σ12 (σ1 σ2+σ122)

−43 0 0 0σ12

3

−43σ1 σ12

2

43 0 0 0

Page 105: Information Geometry: Near Randomness and Near Independence

4.19 Bivariate Gaussian a-Geometry 95

[R(α)24kl]

(α2 − 1)=

0 0σ2(σ1σ2+σ12

2)−43

σ12(3σ1σ2+σ122)

43σ1σ12

2

−23

0 0σ12(σ1σ2+σ12

2)43

σ1(σ1σ2+3σ122)

−43σ1

2σ1223

σ2(σ1σ2+σ122)

43σ12(σ1σ2+σ12

2)−43 0 0 0

σ12(3 σ1σ2+σ122)

−43σ1(σ1σ2+3 σ12

2)43 0 0 0

σ1σ122

23σ1

2σ12−23 0 0 0

[R(α)25kl] =

(

α2 − 1)

0 0 σ1 σ2 σ124 ()3

−σ1 (σ1 σ2+σ122)

43σ1

2 σ1243

0 0 σ1 σ122

−43σ1

2 σ1223

σ13

−43

σ1 σ2 σ12−43

σ1 σ122

43 0 0 0σ1 (σ1 σ2+σ12

2)43

σ12 σ12

−23 0 0 0σ1

2 σ12−43

σ13

43 0 0 0

[R(α)34kl] =

(

α2 − 1)

0 −σ242 0 0 0

σ242 0 0 0 00 0 0 σ2

2

−43σ2 σ1243

0 0 σ22

43 0 σ1 σ2−43

0 0 σ2 σ12−43

σ1 σ243 0

[R(α)35kl] =

(

α2 − 1)

0 σ1242 0 0 0

−σ1242 0 0 0 00 0 0 σ2 σ12

43σ12

2

−43

0 0 σ2 σ12−43 0 σ1 σ12

43

0 0 σ122

43σ1 σ12−43 0

[R(α)45kl] =

(

α2 − 1)

0 −σ142 0 0 0

σ142 0 0 0 00 0 0 σ1 σ2

−43σ1 σ1243

0 0 σ1 σ243 0 σ1

2

−43

0 0 σ1 σ12−43

σ12

43 0

. (4.132)

Proposition 4.47. The components of the α-Ricci tensor are given by thesymmetric matrix R(α) = [R(α)

ij ]:

Page 106: Information Geometry: Near Randomness and Near Independence

96 4 Information Geometry of Bivariate Families

R(α) =(

α2 − 1)

σ22 − σ12

2 0 0 0− σ12

2σ12 0 0 0

0 0 σ22

22 −σ2 σ122

3 σ122−σ1 σ242

0 0 −σ2 σ122

3 σ1 σ2+σ122

22 −σ1 σ122

0 0 3 σ122−σ1 σ242 −σ1 σ12

2σ1

2

22

.

(4.133)

Proposition 4.48. The bivariate Gaussian manifold N has a constantα-scalar curvature R(α):

R(α) =9(

α2 − 1)

2(4.134)

This recovers the known result for the 0-scalar curvature R(0) = − 92 . So geo-

metrically N constitutes part of a pseudosphere, §3.17.

Proposition 4.49. The α-sectional curvatures of N are

(α) = [(α)(i, j)]

=(

α2 − 1)

0 − 14

12

σ1σ2+3σ122

4(σ1σ2+σ122)σ12

2

2σ1σ2

−14 0 σ12

2

2σ1σ2

σ1σ2+3σ122

4(σ1σ2+σ122)12

12

σ122

2σ1σ20 1

2σ12

2

σ1σ2+σ122

σ1σ2+3σ122

4(σ1σ2+σ122)σ1σ2+3σ12

2

4(σ1σ2+σ122)12 0 1

2σ12

2

2σ1σ2

12

σ122

σ1σ2+σ12212 0

The α-sectional curvatures of N can be written as a function of correlationcoefficient ρ only, Figure 4.6:

(α) =(

α2 − 1)

0 − 14

12

1+3 ρ2

4 (1+ρ2)ρ2

2

− 14 0 ρ2

21+3 ρ2

4 (1+ρ2)12

12

ρ2

2 0 12

ρ2

1+ρ2

1+3 ρ2

4 (1+ρ2)1+3 ρ2

4 (1+ρ2)12 0 1

2ρ2

212

ρ2

1+ρ212 0

. (4.135)

Proposition 4.50. The α-mean curvatures (α)(λ) (λ = 1, 2, 3, 4, 5) aregiven by:

(α)(1) = (α)(2) =α2 − 1

8,

(α)(3) = (α)(5) =α2 − 1

4,

Page 107: Information Geometry: Near Randomness and Near Independence

4.19 Bivariate Gaussian a-Geometry 97

-1 -0.5 0.5 1

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

(0)

(3,5) (0)

(1,5)(0)

(1,4)

(0)

(1,3)

(0)

(1,2)

Correlation ρ(X, Y)

0-Sectional curvature (0)(i, j)

Fig. 4.6. The 0-sectional curvatures (0)(i, j) as a function of correlation ρ(X, Y ) forbivariate normal manifold N where: (0)(1, 3) = (0)(2, 5) = (0)(3, 4) = (0)(4, 5) =− 1

2, (0)(1, 2) = 1

4, (0)(1, 4) = (0)(2, 4) and (0)(1, 5) = (0)(2, 3). Note that

(0)(1, 4), (0)(1, 5) and (0)(3, 5) have limiting value − 12

as ρ → ±1.

-1 -0.5 0.5 1

-0.35

-0.25

-0.2

-0.15(0)(1) = (0)(2)

(0)(4)

(0)(3) = (0)(5)

Correlation ρ(X, Y )

0-Mean curvature (0)(λ)

Fig. 4.7. The 0-mean curvatures (α)(λ) as a function of correlation ρ(X, Y ) forbivariate normal manifold N where; (0)(1) = (0)(2) = − 1

8, (0)(3) = (0)(5) = − 1

4,

and (0)(4) has limiting value − 14

as ρ(X, Y ) → ±1, and − 38

as ρ → 0.

(α)(4) =

(

α2 − 1) (

3σ1 σ2 + σ122)

8 (σ1 σ2 + σ122)

=

(

α2 − 1) (

3 + ρ2)

8 (1 + ρ2). (4.136)

Figure 4.7 shows a plot of the 0-mean curvatures (α) as a function of corre-lation ρ for bivariate normal manifold N .

Page 108: Information Geometry: Near Randomness and Near Independence

98 4 Information Geometry of Bivariate Families

4.20 Bivariate Gaussian Foliations

Since N is an exponential family, §3.2, a mutually dual coordinate system isgiven by the potential function ϕ(θ) (4.129), that is

η1 =∂ϕ

∂θ1=

2 θ1 θ5 − θ2 θ4

θ42 − 4 θ3 θ5

= µ1 ,

η2 =∂ϕ

∂θ2=

2 θ2 θ3 − θ1 θ4

θ42 − 4 θ3 θ5

= µ2 ,

η3 =∂ϕ

∂θ3=

θ22 θ4

2 + 2 θ4 (−2 θ1 θ2 + θ4) θ5 + 4(

θ12 − 2 θ3

)

θ52

(

θ42 − 4 θ3 θ5

)2 =µ12 + σ1 ,

η4 =∂ϕ

∂θ4= −

2θ22 θ3θ4 + θ43 + 2

(

θ12 − 2θ3

)

θ4θ5 − θ1θ2(

θ42 + 4θ3θ5

)

(

θ42 − 4θ3θ5

)2

= µ1µ2 + σ12,

η5 =∂ϕ

∂θ5=

4θ22 θ32 − 4θ1 θ2 θ3 θ4 +

(

θ12 + 2 θ3

)

θ42 − 8θ32 θ5

(

θ42 − 4θ3 θ5

)2 = µ22 + σ2 .

(4.137)

Then (θ1, θ2, θ3, θ4, θ5) is a 1-affine coordinate system, (η1, η2, η3, η4, η5) isa (−1)-affine coordinate system, and they are mutually dual with respect tothe Fisher information metric. The coordinates (ηi) have potential function λ

λ = −(

1 + log(2π√

))

. (4.138)

We obtain dually orthogonal foliations using(η1, η2, θ3, θ4, θ5) =

(µ1, µ2,−σ2

2 (σ1σ2 − σ122),

σ12

σ1σ2 − σ122,

−σ1

2 (σ1σ2 − σ122)

)

as a coordinate system for N , then the density functions take the form:f(x, y; η1, η2, θ3, θ4, θ5) =

4 θ3 θ5 − θ42

2πeθ3 (x−µ1)

2+θ4 (x−µ1) (y−µ2)+θ5 (y−µ2)2

(4.139)

and the Fisher metric is

[gij ] =

σ1 σ12 0 0 0σ12 σ2 0 0 00 0 σ2

2

22 −σ2 σ122

σ122

22

0 0 −σ2 σ122

σ1 σ2+σ122

2 −σ1 σ122

0 0 σ122

22 −σ1 σ122

σ12

22

. (4.140)

It follows that (θi) is a geodesic coordinate system of ∇(1), and (ηi) is ageodesic coordinate system of ∇(−1).

Page 109: Information Geometry: Near Randomness and Near Independence

4.21 Bivariate Gaussian Submanifolds 99

4.21 Bivariate Gaussian Submanifolds

We study three submanifolds, §2.0.5.

4.21.1 Independence Submanifold N1

This has definitionN1 ⊂ N : σ12 = 0.

The density functions are of form:

f(x, y;µ1, µ2, σ1, σ2) = NX(µ1, σ1).NY (µ2, σ2) (4.141)

This is the case for statistical independence of X and Y , by §1.3, so the spaceN1 is the direct product of two Riemannian spaces

NX(µ1, σ1), µ1 ∈ R, σ1 ∈ R+ and NY (µ2, σ2), µ2 ∈ R, σ2 ∈ R

+.

We report expressions for the metric, the α-connections and theα-curvature objects using the natural coordinate system

(θ1, θ2, θ3, θ4) = (µ1

σ1,µ2

σ2,− 1

2σ1,− 1

2σ2)

and potential function ϕ=log(2π√)−

(

θ22θ3 + θ1

2θ4)

=log(2π√σ1σ2)

+ µ12

2σ1+ µ2

2

2σ2; where = 1

4θ3θ4.

Proposition 4.51. The metric tensor is:

[gij ] =

σ1 0 2µ1 σ1 00 σ2 0 2µ2 σ2

2µ1 σ1 0 2σ1

(

2µ12 + σ1

)

00 2µ2 σ2 0 2σ2

(

2µ22 + σ2

)

. (4.142)

Proposition 4.52. The nonzero independent components of the α-connectionare

Γ(α)13,1 = − (α− 1) σ1

2 ,

Γ(α)33,1 = −4 (α− 1) µ1 σ1

2 ,

Γ(α)24,2 = − (α− 1) σ2

2 ,

Γ(α)44,2 = −4 (α− 1) µ2 σ2

2 ,

Γ(α)33,3 = −4 (α− 1) σ1

2(

3µ12 + σ1

)

,

Γ(α)111 = −Γ (α)3

13 = (α− 1) µ1 ,

Γ(α)131 = (α− 1)

(

2µ12 − σ1

)

,

Page 110: Information Geometry: Near Randomness and Near Independence

100 4 Information Geometry of Bivariate Families

Γ(α)133 = 4 (α− 1) µ1

3 ,

Γ(α)311 = Γ

(α)422 =

1 − α

2,

Γ(α)333 = −2 (α− 1)

(

µ12 + σ1

)

,

Γ(α)222 = −Γ (α)4

24 = (α− 1) µ2 ,

Γ(α)242 = (α− 1)

(

2µ22 − σ2

)

,

Γ(α)244 = 4 (α− 1) µ2

3 ,

Γ(α)444 = −2 (α− 1)

(

µ22 + σ2

)

. (4.143)

Proposition 4.53. The α-curvature tensor is

R(α)1313 = −

(

α2 − 1)

σ13, R

(α)2424 = −

(

α2 − 1)

σ23 (4.144)

while the other independent components are zero. By contraction we obtain:The α-Ricci tensor:

R(α) =(

α2 − 1)

σ12 0 µ1 σ1 00 σ2

2 0 µ2 σ2

µ1 σ1 0 σ1

(

2µ12 + σ1

)

00 µ2 σ2 0 σ2

(

2µ22 + σ2

)

. (4.145)

The α-eigenvalues of the α-Ricci tensor are:

(

α2 − 1)

σ14

+ µ12 σ1 + σ1

4

(

16 µ14 + (1 − 2 σ1)

2 + 8 µ12 (1 + 2 σ1) + 2

σ1

)

σ14

+ µ12 σ1 − σ1

4

(

16 µ14 + (1 − 2 σ1)

2 + 8 µ12 (1 + 2 σ1) − 2

σ1

)

σ24

+ µ22 σ2 + σ2

4

(

16 µ24 + (1 − 2 σ2)

2 + 8 µ22 (1 + 2 σ2) + 2

σ2

)

σ24

+ µ22 σ2 − σ2

2

4

(

16 µ24 + (1 − 2 σ2)

2 + 8 µ22 (1 + 2 σ2) − 2

σ2

)

.

(4.146)

The α-scalar curvature:

R(α) = 2(

α2 − 1)

(4.147)

The α-sectional curvatures:

(α) =

(

α2 − 1)

2

0 0 1 00 0 0 11 0 0 00 1 0 0

(4.148)

Page 111: Information Geometry: Near Randomness and Near Independence

4.21 Bivariate Gaussian Submanifolds 101

The α-mean curvatures:

(α)(1) = (α)(2) = (α)(3) = (α)(4) =α2 − 1

6. (4.149)

In N1, the α-scalar curvature, the α-sectional curvatures and the α-mean cur-vatures are constant; they are negative when α = 0.

Proposition 4.54. The submanifold N1 is an Einstein space.

Proof. By comparison of the metric tensor (4.142) with the Ricci tensor(4.145), we deduce

R(0)ij =

R(0)

kgij

where k is the dimension of the space. Then the submanifold N1 with statis-tically independent random variables is an Einstein space.

4.21.2 Identical Marginals Submanifold N2

This isN2 ⊂ N : σ1 = σ2 = σ, µ1 = µ2 = µ.

The density functions are of form f(x, y;µ, σ, σ12) =

12π

√σ2 − σ12

2e− 1

2(σ2−σ122) (σ(x−µ)2−2σ12(x−µ)(y−µ)+σ(y−µ)2) (4.150)

The marginal functions are fX = fY ≡ N(µ, σ) with correlation coefficientρ(X,Y ) = σ12

σ .We report the expressions for the metric, the α-connections and the

α-curvature objects using the natural coordinate system

(θ1, θ2, θ3) = (µ

σ + σ12,

−σ2 (σ2 − σ12

2),

σ12

(σ2 − σ122)

)

and the potential function

ϕ = − θ12

2 θ2 + θ3+ log(2π) − 1

2log(4 θ22 − θ3

2)

=µ2

σ + σ12+ log(2π) +

12

log(σ2 − σ122) . (4.151)

Proposition 4.55. The metric tensor is[gij ] =⎡

2 (σ + σ12) 4µ (σ + σ12) 2µ (σ + σ12)4µ (σ + σ12) 4

(

σ(

2µ2 + σ)

+ 2µ2σ12 + σ122)

4(

µ2σ +(

µ2 + σ)

σ12

)

2µ (σ + σ12) 4(

µ2σ +(

µ2 + σ)

σ12

)

σ(

2µ2 + σ)

+ 2µ2σ12 + σ122

⎦.

(4.152)

Page 112: Information Geometry: Near Randomness and Near Independence

102 4 Information Geometry of Bivariate Families

Proposition 4.56. The components of the α-connection are

[Γ (α)1ij ] = (α− 1)

2µ 4µ2 − σ − σ124 µ2−σ−σ12

24µ2 − σ − σ12 8µ3 4 µ3

4 µ2−σ−σ122 4µ3 2µ3

[Γ (α)2ij ] = (α− 1)

−12 −µ −µ

2−µ −2

(

µ2 + σ)

−(

µ2 + σ12

)

−µ2 −

(

µ2 + σ12

) −(µ2+σ)2

[Γ (α)3ij ] = (α− 1)

−1 −2µ −µ−2µ −4

(

µ2 + σ12

)

−2(

µ2 + σ)

−µ −2(

µ2 + σ)

−(

µ2 + σ12

)

⎦ (4.153)

The analytic expressions for the functions Γ (α)ij,k are known [13] but rather long,

so we do not report them here. Proposition 4.57. By direct calculation we have the α-curvature tensorof N2

R(α)12kl =

(

α2 − 1)

0 −2 (σ + σ12)3 − (σ + σ12)

3

2 (σ + σ12)3 0 0

(σ + σ12)3 0 0

R(α)13kl =

(

α2 − 1)

0 − (σ + σ12)3 −(σ+σ12)

3

2

(σ + σ12)3 0 0

(σ+σ12)3

2 0 0

(4.154)

while the other independent components are zero.By contraction we obtain the

α-Ricci tensor R(α) =

(

α2 − 1)

(σ + σ12) 2µ (σ + σ12) µ (σ + σ12)

2µ (σ + σ12) (σ + σ12)(

4µ2 + σ + σ12

) (σ+σ12) (4 µ2+σ+σ12)2

µ (σ + σ12)(σ+σ12) (4 µ2+σ+σ12)

2

(σ+σ12) (4 µ2+σ+σ12)4

(4.155)

The α-eigenvalues of the α-Ricci tensor are⎛

010(α2−1)(σ+σ12)

2

4(1+5µ2)+5−√

400µ4+(4−5σ)2+40µ2(4+5σ)+5σ12 (−8+40 µ2+10σ+5 σ12)10(α2−1)(σ+σ12)

2

4(1+5µ2)+5+√

400 µ4+(4−5 σ)2+40µ2 (4+5σ)+5σ12(−8+40 µ2+10σ+5 σ12)

(4.156)

Page 113: Information Geometry: Near Randomness and Near Independence

4.21 Bivariate Gaussian Submanifolds 103

The α-scalar curvature:

R(α) =(

α2 − 1)

(4.157)

The α-sectional curvatures:

(α) =(

α2 − 1)

0 (σ+σ12)2

4 (σ2+σ122)(σ+σ12)

2

4 (σ2+σ122)(σ+σ12)

2

4 (σ2+σ122) 0 0(σ+σ12)

2

4 (σ2+σ122) 0 0

(4.158)

The α-mean curvatures:

(α)(1) =14(

α2 − 1)

,

(α)(2) = (α)(3) =

(

α2 − 1)

(σ + σ12)(

4µ2 + σ + σ12

)

8 (σ (2µ2 + σ) + 2µ2 σ12 + σ122)

. (4.159)

4.21.3 Central Mean Submanifold N3

This is defined asN3 ⊂ N : µ1 = µ2 = 0.

The density functions aref(x, y;σ1, σ2, σ12) =

12π

√σ1 σ2 − σ12

2e− 1

2 (σ1 σ2−σ122) (σ2x2−2 σ12 x y+σ1y2) (4.160)

with marginal functions

fX(x) = NX(0, σ1), and fY (y) = NY (0, σ2)

and correlation coefficient

ρ(X,Y ) =σ12√σ1 σ2

.

We report the expressions for the metric, the α-connections and the α-curvature objects using the natural coordinate system

(θ1, θ2, θ3) = (− σ2

2 (σ1 σ2 − σ122),

σ12

σ1 σ2 − σ122,− σ1

2 (σ1 σ2 − σ122)

)

and the potential function

ϕ = log(2π) − 12

log(√

4 θ1 θ3 − θ42) = log(2π) +

12

log(σ1 σ2 − σ122) .

Page 114: Information Geometry: Near Randomness and Near Independence

104 4 Information Geometry of Bivariate Families

Proposition 4.58. The metric tensor is as follows:

[gij ] =

2σ12 2σ1 σ12 2σ12

2

2σ1 σ12 σ1 σ2 + σ122 2σ2 σ12

2σ122 2σ2 σ12 2σ2

2

⎦ . (4.161)

Proposition 4.59. The components of the α-connection are

[Γ (α)ij,1 ] = (α− 1)

−4σ13 −4σ1

2 σ12 −4σ1 σ122

−4σ12 σ12 −σ1

(

σ1 σ2 + 3σ122)

−2(

σ1 σ2 σ12 + σ123)

−4σ1 σ122 −2

(

σ1 σ2 σ12 + σ123)

−4σ2 σ122

[Γ(α)ij,2 ]

(α − 1)=

−4 σ12 σ12 −σ1

(

σ1 σ2 + 3 σ122)

−2(

σ1 σ2 σ12 + σ123)

−σ1

(

σ1 σ2 + 3 σ122)

−3 σ1 σ2 σ12 − σ123 −σ2

(

σ1 σ2 + 3 σ122)

−2(

σ1 σ2 σ12 + σ123)

−σ2

(

σ1 σ2 + 3 σ122)

−4 σ22 σ12

[Γ (α)ij,3 ] = (α− 1)

−4σ1 σ122 −2

(

σ1 σ2 σ12 + σ123)

−4σ2 σ122

−2(

σ1 σ2 σ12 + σ123)

−σ2

(

σ1 σ2 − 3σ122)

−4σ22 σ12

−4σ2 σ122 −4σ2

2 σ12 −4σ23

[Γ (α)1ij ] = (α− 1)

−2σ1 −σ12 0−σ12

−σ22 0

0 0 0

[Γ (α)2ij ] = (α− 1)

0 −σ1 −2σ12

−σ1 −σ12 −σ2

−2σ12 −σ2 0

[Γ (α)3ij ] = (α− 1)

0 0 00 −σ1

2 −σ12

0 −σ12 −2σ2

⎦ . (4.162)

Proposition 4.60. By direct calculation we have the nonzero independentcomponents of the α-curvature tensor of N3

R(α)12kl =

(

α2 − 1)

0 −σ12 −2σ1 σ12

σ12 0 −σ1 σ2

2σ1 σ12 σ1 σ2 0

R(α)13kl =

(

α2 − 1)

0 −2σ1 σ12 −4σ122

2σ1 σ12 0 −2σ2 σ12 4σ12

2 2σ2 σ12 0

Page 115: Information Geometry: Near Randomness and Near Independence

4.21 Bivariate Gaussian Submanifolds 105

R(α)23kl =

(

α2 − 1)

0 −σ1 σ2 −2σ2 σ12 σ1 σ2 0 −σ2

2 2σ2 σ12 σ2

2 0

⎦ . (4.163)

By contraction:The α- Ricci tensor:

R(α) =(

α2 − 1)

σ12 σ1 σ12 2σ12

2 − σ1 σ2

σ1 σ12 σ1 σ2 σ2 σ12

2σ122 − σ1 σ2 σ2 σ12 σ2

2

⎦ . (4.164)

The α-eigenvalues of the α-Ricci tensor are given by:

(

α2 − 1)

2

0σ1

2 + σ1 σ2 + σ22 − S

σ12 + σ1 σ2 + σ2

2 + S

⎠ (4.165)

where

S =√

(σ12 − σ1 σ2 + σ2

2)2 + 4 (σ12 − 4σ1 σ2 + σ2

2) σ122 + 16σ12

4.

The α-scalar curvature:

R(α) = 2(

α2 − 1)

(4.166)

The α-sectional curvatures:

(α) =(

α2 − 1)

0 12

ρ2

1+ρ2

12 0 1

2ρ2

1+ρ212 0

⎦(4.167)

The α-mean curvatures:

(α)(1) = (α)(3) =14(

α2 − 1)

,

(α)(2) =

(

α2 − 1)

σ1 σ2

2 (σ1 σ2 + σ122)

=

(

α2 − 1)

2 (1 + ρ2). (4.168)

For N3 the α-mean curvatures have limiting value (α2−1)4 as ρ2 → 1.

4.21.4 Affine Immersion

Proposition 4.61. Let N be the bivariate Gaussian manifold with the Fishermetric g and the exponential connection ∇(1). Denote by (θi) the natural coor-dinate system (4.128). Then N can be realized in R

6 by the graph of a potentialfunction, §3.4, via the affine immersion f, ξ:

Page 116: Information Geometry: Near Randomness and Near Independence

106 4 Information Geometry of Bivariate Families

f : N → R6 :[

θi

]

→[

θi

ϕ(θ)

]

, ξ =

000001

, (4.169)

where ϕ(θ) is the potential function

ϕ(θ) = log(2π√

) −(

θ22 θ3 − θ1 θ2 θ4 + θ1

2 θ5)

.

The submanifold consisting of the independent case with zero means andidentical standard deviations is represented by the curve:

(−∞, 0) → R3 : (θ1) → (θ1, 0, log(−4π θ1)), ξ = (0, 0, 1)

: (− 12σ

) → (− 12σ

, 0, log(2π σ)), ξ = (0, 0, 1).

4.22 Bivariate Log-Gaussian Manifold

The bivariate log-Gaussian distribution has log-Gaussian marginal functions.These arise from the bivariate Gaussian density functions (4.120) for the non-negative random variables x = log 1

n and y = log 1m , or equivalently, n = e−x

and m = e−y.Their probability density functions are given for n ,m > 0 by:

g(n,m) =1

2πnm√σ1σ2 − σ12

2e−VW (4.170)

V =1

2 (σ1σ2 − σ122)

W =(

σ2(log(n) + µ1)2 − 2σ12 (log(n) + µ1) (log(m) + µ2)

+σ1(log(m) + µ2)2)

.

Corollary 4.62. The covariance, correlation coefficient, and marginal den-sity functions are:

Covg(n,m) = (eσ12 − 1) e−(µ1+µ2)+12 (σ1+σ2) ,

ρg(n,m) =(eσ12 − 1)√

σ1 σ2e−(µ1+µ2)+

12 (σ1+σ2) ,

gn(n) =1

n√

2π σ1e−

(log(n)+µ1)2

2 σ1 ,

gm(m) =1

m√

2π σ2e−

(log(m)+µ2)2

2 σ2 . (4.171)

Note that the variables n and m are independent if and only if σ12 = 0, andthe marginal functions are log-Gaussian density functions.

Page 117: Information Geometry: Near Randomness and Near Independence

4.22 Bivariate Log-Gaussian Manifold 107

Directly from the definition of the Fisher metric we deduce:

Proposition 4.63. The family of bivariate log-Gaussian density functions forrandom variables n,m determines a Riemannian 5-manifold which is an iso-metric isomorph of the bivariate Gaussian 5-manifold, §3.1.

Page 118: Information Geometry: Near Randomness and Near Independence
Page 119: Information Geometry: Near Randomness and Near Independence

5

Neighbourhoods of Poisson Randomness,Independence, and Uniformity

As we have mentioned before, colloquially in applications, it is very commonto encounter the usage of ‘random’ to mean the specific case of a Poissonprocess §1.1.3 whereas formally in statistics, the term random has a moregeneral meaning: probabilistic, that is dependent on random variables. Whenwe speak of neighbourhoods of randomness we shall mean neighbourhoodsof a Poisson process and then the neighbourhoods contain perturbations ofthe Poisson process. Similarly, we consider processes that are perturbationsof a process controlled by a uniform distribution on a finite interval, yield-ing neighbourhoods of uniformity. The third situation of interest is when wehave a bivariate process controlled by independent exponential, gamma orGaussian distributions; then perturbations are contained in neighbourhoodsof independence. These neighbourhoods all have well-defined metric structuresdetermined by information theoretic maximum likelihood methods. This al-lows trajectories in the space of processes, commonly arising in practice byaltering input conditions, to be studied unambiguously with geometric toolsand to present a background on which to describe the output features ofinterest of processes and products during changes.

The results here augment our information geometric measures for distancesin smooth spaces of probability density functions, §1.2, by providing explicitgeometric representations with distance measures of neighbourhoods for eachof these important states of statistical processes:

• (Poisson) randomness, §1.1.3• independence, §1.3• uniformity, §1.2.1.

Such results are significant theoretically because they are very general, andpractically because they are topological and so therefore stable under pertur-bations.

K. Arwini, C.T.J. Dodson, Information Geometry. 109Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 120: Information Geometry: Near Randomness and Near Independence

110 5 Neighbourhoods of Poisson Randomness, Independence, and Uniformity

5.1 Gamma Manifold G and Neighbourhoodsof Randomness

The univariate gamma density function, §1.4.1, is widely used to modelprocesses involving a continuous positive random variable. It has importantuniqueness properties [106]. Its information geometry is known and has beenapplied recently to represent and metrize departures from Poisson of, for ex-ample, the processes that allocate gaps between occurrences of each aminoacid along a protein chain within the Saccharomyces cerevisiae genome, seeCai et al [34], clustering of galaxies and communications, and cryptographicattacks Dodson [63, 64, 61, 77]. We have made precise the statement thataround every Poisson random process there is a neighbourhood of processessubordinate to the gamma distribution, so gamma density functions can ap-proximate any small enough departure from Poisson randomness.

Theorem 5.1. Every neighbourhood of a Poisson random process contains aneighbourhood of processes subordinate to gamma density functions.

Proof. Dodson and Matsuzoe [68] have provided an affine immersion inEuclidean R

3 for G, the manifold of gamma density functions, §3.5 with Fisherinformation metric, §3.1. The coordinates (ν, κ) form a natural coordinatesystem (cf Amari and Nagaoka [11]) for the gamma manifold G of densityfunctions (3.15):

p(x; ν, κ) = νκ xκ−1 e−xν

Γ (κ).

Then G can be realized in Euclidean R3 as the graph of the affine immersion,

§3.4, §3.5.5, h, ξ where ξ is a transversal vector field along h [11, 68]:

h : G → R3 :(

νκ

)

νκ

logΓ (κ) − κ log ν

⎠ , ξ =

001

⎠ .

The submanifold, §2.0.5, of exponential density functions, §1.2.2 is representedby the curve

(0,∞) → R3 : ν → ν, 1, log

and for this curve, a tubular neighbourhood in R3 such as that bounded by

the surface

ν − 0.6 cos θ√1 + ν2

, 1 − 0.6 sin θ,−0.6 ν cos θ√

1 + ν2− log ν θ ∈ [0, 2π)

(5.1)

will contain all immersions for small enough perturbations of exponential den-sity functions. In Figure 5.1 this is depicted in natural coordinates ν, κ. Thetubular neighbourhood (5.1) intersects with the gamma manifold immersionto yield the required neighbourhood in the manifold of gamma density func-tions, which completes our proof.

Page 121: Information Geometry: Near Randomness and Near Independence

5.2 Log-Gamma Manifold L and Neighbourhoods of Uniformity 111

Fig. 5.1. Tubular neighbourhood of univariate Poisson random processes. Affineimmersion in natural coordinates ν, κ as a surface in R

3 for the gamma manifold G;the tubular neighbourhood surrounds all exponential density functions—these lieon the curve κ = 1 in the surface. Since the log-gamma manifold L is an isometricisomorph of G, this figure represents also a tubular neighbourhood in R

3 of theuniform density function from the log-gamma manifold.

5.2 Log-Gamma Manifold L and Neighbourhoodsof Uniformity

The family of log-gamma density functions discussed in §3.6 has probabilitydensity functions for random variable N ∈ (0, 1] given by equation (3.30):

q(N, ν, τ) =1N

1− τν ( τ

ν )τ (log 1N )τ−1

Γ (τ)for ν > 0 and τ > 0 .

This family has the uniform density function, §1.2.1, as a limit

limν,κ→1

q(N, ν, κ) = q(N, 1, 1) = 1 .

Figure 5.2 shows some log-gamma density functions around the uniform den-sity function and Figure 3.3 shows part of the continuous family from whichthese are drawn.

Page 122: Information Geometry: Near Randomness and Near Independence

112 5 Neighbourhoods of Poisson Randomness, Independence, and Uniformity

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5q(N; ν, κ)

κ = 5

κ = 2

κ = 1

κ = 12

N

Fig. 5.2. Log-gamma probability density functions q(N ; ν, κ), N ∈ (0, 1], with cen-tral mean N = 1

2, and κ = 1

2, 1, 2, 5. The case κ = 1 is the uniform density

function, complementary to the exponential case for the gamma density function.The regime κ < 1 corresponds in gamma distributions to clustering in an underly-ing spatial process; conversely, κ > 1 corresponds to dispersion and greater evennessthan random.

From §3.6, the log-gamma manifold L has information metric (3.36), iso-metric with the gamma manifold, G by Proposition 3.9. Hence, from the resultof Dodson and Matsuzoe [68] the immersion of G in R

3, Figure 5.1, representsalso the log-gamma manifold L. Then, since the isometry, §2.0.5 sends the ex-ponential density function to the uniform density function on [0, 1], we obtaina general deduction

Theorem 5.2. Every neighbourhood of the uniform density function containsa neighbourhood of log-gamma density functions. Equivalently,

Theorem 5.3. Every neighbourhood of a uniform process contains a neigh-bourhood of processes subordinate to log-gamma density functions.

5.3 Freund Manifold F and Neighbourhoodsof Independence

Let F be the manifold of Freund bivariate mixture exponential density func-tions (4.63), §4.9, so with positive parameters αi, βi,

F ≡ f |f(x, y;α1, β1, α2, β2) =

α1β2e−β2y−(α1+α2−β2)x for 0 ≤ x < y

α2β1e−β1x−(α1+α2−β1)y for 0 ≤ y ≤ x

.

(5.2)

Page 123: Information Geometry: Near Randomness and Near Independence

5.3 Freund Manifold F and Neighbourhoods of Independence 113

5.3.1 Freund Submanifold F2

This isF2 ⊂ F : α1 = α2, β1 = β2.

The densities are of form:

f(x, y;α1, β1) =

α1β1 e−β1y−(2 α1−β1)x for 0 < x < y

α1β1 e−β1x−(2 α1−β1)y for 0 < y < x

(5.3)

with parameters α1, β1 > 0. The covariance, correlation coefficient and mar-ginal density functions, of X and Y are given by:

Cov(X,Y ) =14

(

1α1

2− 1

β12

)

, (5.4)

ρ(X,Y ) = 1 − 4α12

3α12 + β1

2 , (5.5)

fX(x) =(

α1

2α1 − β1

)

β1 e−β1x

+(

α1 − β1

2α1 − β1

)

(2α1) e−2 α1x , x ≥ 0 , (5.6)

fY (y) =(

α1

2α1 − β1

)

β1 e−β1y

+(

α1 − β1

2α1 − β1

)

(2α1) e−2 α1y , y ≥ 0 . (5.7)

Proposition 5.4. F2 forms an exponential family, §3.2, with parameters(α1, β1) and potential function

ψ = − log(α1 β1) (5.8)

Proposition 5.5. Cov(X,Y ) = ρ(X,Y ) = 0 if and only if α1 = β1 and inthis case the density functions are of form

f(x, y;α1, α1) = α21 eα1|y−x| = fX(x)fY (y) (5.9)

so that here we do have independence of these exponentials if and only if thecovariance is zero.

Neighbourhoods of Independence in F2

An important practical application of the Freund submanifold F2 is therepresentation of a bivariate proces for which the marginals are identical ex-ponentials. The next result is important because it provides topological neigh-bourhoods of that subspace W in F2 consisting of the bivariate processes thathave zero covariance: we obtain neighbourhoods of independence for Poissonrandom (ie exponentially distributed §1.2.2) processes.

Page 124: Information Geometry: Near Randomness and Near Independence

114 5 Neighbourhoods of Poisson Randomness, Independence, and Uniformity

Theorem 5.6. Every neighbourhood of an independent pair of identical ran-dom processes contains a neighbourhood of bivariate processes subordinate toFreund density functions.

Proof. Let F2, g,∇(1),∇(−1) be the manifold F2 with Fisher metric g andexponential connection ∇(1). Then F2 can be realized in Euclidean R

3 by thegraph of a potential function, via the affine immersion

h : G → R3 :(

α1

β1

)

α1

β1

− log(α1 β1)

⎠ .

In F2, the submanifold W consisting of the independent case (α1 = β1) isrepresented by the curve

W : (0,∞) → R3 : (α1) → (α1, α1,−2 logα1). (5.10)

which has tubular neighbourhoods of form⎧

t− r cos(θ)√

1 + 1t2

1t4 t

3, t− r sin(θ)

1t4 t

2,− r cos(θ)√

1 + 1t2

1t4 t

2− 2 log(t)

This is illustrated in Figure 5.3 which shows an affine embedding of F2 as asurface in R

3, and an R3-tubular neighbourhood of W , the curve α1 = β1

in the surface. This curve W represents all bivariate density functions havingidentical exponential marginals and zero covariance; its tubular neighourhoodrepresents all small enough departures from independence.

5.4 Neighbourhoods of Independence for Gaussians

The bivariate Gaussian density function, §4.16 has the form:

f(x, y)=1

2π√

σ1 σ2 − σ122

e− 1

2(σ1 σ2−σ122) (σ2(x−µ1)2−2σ12(x−µ1)(y−µ2)+σ1(y−µ2)

2),

(5.11)

defined on −∞ < x , y < ∞ with parameters (µ1, µ2, σ1, σ12, σ2); where−∞ < µ1 , µ2 < ∞, 0 < σ1 , σ2 < ∞ and σ12 is the covariance of X and Y.

The marginal functions, of X and Y are univariate Gaussian density func-tions, §1.2.3:

fX(x, µ1, σ1) =1√

2π σ1e−

(x−µ1)2

2 σ1 , (5.12)

fY (y, µ2, σ2) =1√

2π σ2e−

(y−µ2)2

2 σ2 . (5.13)

Page 125: Information Geometry: Near Randomness and Near Independence

5.4 Neighbourhoods of Independence for Gaussians 115

Fig. 5.3. Tubular neighbourhood of independent Poisson random processes. Anaffine immersion in natural coordinates (α1, β1) as a surface in R

3 for the Freundsubmanifold F2; the tubular neighbourhood surrounds the curve (α1 = β1 in thesurface) consisting of all bivariate density functions having identical exponentialmarginals and zero covariance.

The correlation coefficient is, §1.3:

ρ(X,Y ) =σ12√σ1 σ2

Since σ122 < σ1 σ2 then −1 < ρ(X,Y ) < 1; so we do not have the case

when Y is a linearly increasing (or decreasing) function of X. The space ofbivariate Gaussians becomes a Riemannian 5-manifold N , §4.17, with Fisherinformation metric, §3.1.

Gaussian Independence Submanifold N1

This is N1 ⊂ N : σ12 = 0. The density functions are of form:

f(x, y;µ1, µ2, σ1, σ2) = fX(x, µ1, σ1).fY (y, µ2, σ2) (5.14)

Page 126: Information Geometry: Near Randomness and Near Independence

116 5 Neighbourhoods of Poisson Randomness, Independence, and Uniformity

This is the case for statistical independence, §1.3 of X and Y , so the spaceN1 is the direct product of two Riemannian spaces

fX(x, µ1, σ1), µ1 ∈ R, σ1 ∈ R+ and fY (y, µ2, σ2), µ2 ∈ R, σ2 ∈ R

+.

Gaussian Identical Marginals Submanifold N2

This isN2 ⊂ N : σ1 = σ2 = σ, µ1 = µ2 = µ.

The density functions are of form:

f(x, y;µ, σ, σ12)=1

2π√σ2 − σ12

2e− 1

2(σ2−σ122) (σ(x−µ)2−2σ12(x−µ)(y−µ)+σ(y−µ)2).

(5.15)

The marginal functions are fX = fY ≡ N(µ, σ), with correlation coefficientρ(X,Y ) = σ12

σ .

Central Mean Submanifold N3

This isN3 ⊂ N : µ1 = µ2 = 0.

The density functions are of form:

f(x, y;σ1, σ2, σ12) =1

2π√σ1 σ2 − σ12

2e− 1

2 (σ1 σ2−σ122) (σ2x2−2 σ12 x y+σ1y2).

(5.16)

The marginal functions are fX(x, 0, σ1) and fY (y, 0, σ2), with correlation co-efficient ρ(X,Y ) = σ12√

σ1 σ2.

By similar methods to that used for Freund density functions, the fol-lowing results are obtained [15] for the case of Gaussian marginal densityfunctions [15].

Theorem 5.7. The bivariate Gaussian 5-manifold admits a 2-dimensionalsubmanifold through which can be provided a neighbourhood of independencefor bivariate Gaussian processes.

Corollary 5.8. Via the Central Limit Theorem, by continuity the tubularneighbourhoods of the curve of zero covariance will contain all immersionsof limiting bivariate processes sufficiently close to the independence case forall processes with marginals that converge in density function to Gaussians.

Page 127: Information Geometry: Near Randomness and Near Independence

5.4 Neighbourhoods of Independence for Gaussians 117

Fig. 5.4. Continuous image, as a surface in R3 using standard coordinates, of an

affine immersion for the bivariate Gaussian density functions with zero means andidentical standard deviation σ. The tubular neighbourhood surrounds the curve ofindependence cases (σ12 = 0) in the surface.

Figure 5.4 shows explicitly a tubular neighbourhood⎧

t− r cos(θ)√

1 + 1t2

1t4 t

3,−r sin(θ)√

1t4 t

2, log

(

π√t2

)

− r cos(θ)√

1 + 1t2

1t4 t

2

for the curve of zero covariance processes (σ12 = 0,) in the submanifold ofbivariate Gaussian density functions with zero means and identical standarddeviation σ.

Page 128: Information Geometry: Near Randomness and Near Independence
Page 129: Information Geometry: Near Randomness and Near Independence

6

Cosmological Voids and Galactic Clustering

For a general account of large-scale structures in the universe, see, for ex-ample, Peebles [162] and Fairall [82], the latter providing a comprehensiveatlas. See also Cappi et al [39], Coles [42], Labini et al. [128, 129], Vogeleyet al. [208] and van der Weygaert [202] for further recent discussion of largestructures. The Las Campanas Redshift Survey was a deep survey, providingsome 26,000 data points in a slice out to 500h−1Mpc. Doroshkevich et al. [79](cf. also Fairall [82] §5.4 and his Figure 5.5) revealed a rich texture of fila-ments, clusters and voids and suggested that it resembled a composite of threePoisson processes, §1.1.3, consisting of sheets and filaments:

• Superlarge-scale sheets:60 percent of galaxies, characteristic separation about 77h−1Mpc

• Rich filaments:20 percent of galaxies, characteristic separation about 30h−1Mpc

• Sparse filaments:20 percent of galaxies, characteristic separation about 13h−1Mpc.

Most recently, the data from the 2-degree field Galaxy Redshift Survey (2dF-GRS), cf Croton et al. [49, 50] can provide improved statistics of counts incells and void volumes.

In this chapter we outline some methods whereby such statistical prop-erties may be viewed in an information geometric way. First we look atPoisson processes of extended objects then at coupled processes that relatevoid and density statistics, somewhat heuristically but intended to reveal theway the information geometry can be used to represent such near-Poissonspatial processes. The applications to cosmology here are based on the publi-cations [63, 62, 64, 65].

K. Arwini, C.T.J. Dodson, Information Geometry. 119Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 130: Information Geometry: Near Randomness and Near Independence

120 6 Cosmological Voids and Galactic Clustering

6.1 Spatial Stochastic Processes

There is a body of theory that provides the means to calculate the varianceof density in planar Poisson processes of arbitrary rectangular elements, usingarbitrary finite cells of inspection [58]. We provide details of this method. Inprinciple, it may be used to interpret the survey data by finding a best fitfor filament and sheet sizes—and perhaps their size distributions—and fordetecting departures from Poisson processes. For analyses using ‘counts incells’ in other surveys, see Efstathiou [81] and Szapudi et al. [197]. A hierachyof N -point correlation functions needed to represent clustering of galaxies in acomplete sense was derived by White [210] and he provided explicit formulae,including their continuous limit.

The basic random model, §1.1.3, for spatial stochastic processes represent-ing the distribution of galaxies in space is that arising from a Poisson processof mean density N galaxies per unit volume in a large box—the region coveredby the catalogue data to be studied. Then, the probability of finding exactlym galaxies in a given sample region of volume V, is

Pm =(NV )m

m!e−NV for m = 0, 1, 2, . . . (6.1)

The Poisson probability distribution (6.1) has mean equal to its variance,m = V ar(m) = NV, and this is used as a reference case for comparison withobservational data. Complete sampling of the available space using cells ofvolume V will reveal clustering if the variance of local density over the cellsexceeds N. Moreover, the covariance, §1.3, of density between cells encodescorrelation information about the spatial process being observed. The corre-lation function, §1.3 cf. Peebles [162], is the ratio of the covariance of densityof galaxies in cells separated by distance r, divided by the variance of densityfor the chosen cells

ξ(r) =Cov(r)Cov(0)

=< m(r0)m(r0 + r) >

m2− 1 (6.2)

In the absence of correlation, we expect ξ(r) to decay rapidly to zero withthe separation distance r. In practice, we find that, for r not too large, ξ(r)resembles an exponential decay e−r/d with d of the order of the smallestdimension of the characteristic structural feature.

Another way to detect clustering is to use an increasing sequence V0 <V1 < V2 < . . . of sampling cell volumes; in the absence of correlation we ex-pect that the variance of numbers found using these cells will be the averagenumbers of galaxies in them, NV0 < NV1 < NV2 < . . . , respectively. Sup-pose that a sampling cell of volume V1 contains exactly k sampling cells ofvolume V0, then V ar1, the variance of density of galaxies found using a cellof volume V1, is expressible as

V ar1 =1kV ar0 +

k − 1k

Cov0,1 (6.3)

Page 131: Information Geometry: Near Randomness and Near Independence

6.2 Galactic Cluster Spatial Processes 121

where V ar0 is the variance found using the smaller cells and Cov0,1 is theaverage covariance among the smaller cells in the larger cells. As k → ∞,so 1

k V ar0 → 0 and V ar1 tends to the mean covariance among points insidethe V1 cells. Now, the mean covariance among points inside V1 cells is theexpectation of the covariance between pairs of points separated by distance r,taken over all possible values for r inside a V1 cell. Explicitly

V ar1 =∫ D

0

Cov(r) b(r) dr (6.4)

where b is the probability density function, §1.2, for the distance r betweenpairs of points chosen independently and at random in a V1 cell and D is thediameter or maximum dimension of such a cell.

Ghosh [91] gave examples of different functions b and some analytic resultsare known for covariance functions arising from spatial point processes—byrepresenting the clusters as ‘smoothed out’ lumps of matter—see [58] for thecase of arbitrary rectangles in planar situations. It is convenient to normalizeequation (6.4) by division through by Cov(0) = V ar(0), which is known for aPoisson process; this gives the ‘between cell’ variance for complete samplingusing v1 cells. Then we obtain

V ar1 = V ar(0)∫ D

0

a(r) b(r) dr (6.5)

where a is the point autocorrelation function for the particular type of lumpsof matter being used to represent a cluster of galaxies; typically, a(r) ≈ e−r/d,for ‘small’ r and d is of the order of the smallest dimension of a cluster. Since itinvolves finite cells, V ar1 is in principle measurable so (6.5) can be comparedwith observational data, once the type of sampling cell and representativeextended matter object are chosen. We return to this in the sequel and provideexamples for a two dimensional model. From Labini et al. [129], we note thatexperimentally for clusters of galaxies

ξ(r) ≈(

25r

)1.7

, with r in h−1Mpc (6.6)

which for 2 < r < 10 resembles e−r/d for suitable d near 1.8.

6.2 Galactic Cluster Spatial Processes

From the atlases shown in Fairall [82] and surveys discussed by Labiniet al. [129], one may estimate in a planar slice a representative galactic ‘wall’filament thickness of about 5h−1Mpc and a wall ‘thickness-to-length’ aspectratio A in the range 10 < A < 50. Then, in order to represent galactic clus-tering as a Poisson process of wall filaments of length λ and width ω, we need

Page 132: Information Geometry: Near Randomness and Near Independence

122 6 Cosmological Voids and Galactic Clustering

the point autocorrelation function a for such filaments. In two dimensions itwas shown in [58] that the function a is given in three parts for rectangles oflength λ and width ω by the following.

For 0 < r ≤ ω

a1(r) = 1 − 2π

(

r

λ+

r

ω− r2

2ωλ

)

. (6.7)

For ω < r ≤ λ

a2(r) =2π

(

arcsinω

r− ω

2λ− r

ω+

(r2

ω2− 1)

)

. (6.8)

For λ < r ≤√

(λ2 + ω2)

a3(r) =2

π

(

arcsinω

r− arccos

λ

r− ω

2λ− λ

2ω− r2

2λω+

(r2

λ2− 1) +

(r2

ω2− 1)

)

(6.9)

For small r, as expected even in three dimensions, a(r) ≈ e−2r/πω.Note that for Poisson random squares of side length s, ω = λ = s and we

have only two cases:For 0 < r ≤ s

a1(r) = 1 − 2π

(

2rs

− r2

2s2

)

. (6.10)

For s < r ≤√

(2s2)

a3(r) =2π

(

arcsins

r− arccos

s

r− 1 − r2

2s2+ 2

(r2

s2− 1)

)

(6.11)

This case may be used to represent in two dimensions clusters of galaxies asa Poisson process of smoothed out squares of matter—the sheet-like elementsof Doroshkevich et al. [79].

Next we need b, the probability density function for the distance r betweenpairs of points chosen independently and at random in a suitable inspectioncell. From [91], for square inspection cells of side length x,for 0 ≤ r ≤ x

b(r, x) =4rx4

(

πx2

2− 2rx +

r2

2

)

. (6.12)

For x ≤ r ≤ D =√

2x

b(r, x)=4rx4

(

x2(

arcsin(x

r

)

− arccos(x

r

))

+ 2x√

(r2 − x2) − 12(r2 + 2x2)

)

.

(6.13)

A plot of this function is given in Figure 6.1.

Page 133: Information Geometry: Near Randomness and Near Independence

6.2 Galactic Cluster Spatial Processes 123

0.2 0.4 0.6 0.8 1 1.2 1.4

0.2

0.4

0.6

0.8

1

1.2

1.4

b(r,1)

r

Fig. 6.1. Probability density function b(r, 1) for the distance r between two pointschosen independently and at random in a unit square.

Ghosh [91] gave also the form of b for other types of cells; for arbitraryrectangular cells those expressions can be found in [58]. It is of interest tonote that for small values of r, so r D, the formulae for plane convex cellsof area A and perimeter P all reduce to

b(r,A, P ) =2πrA

− 2Pr2

A2

which would be appropriate to use when the filaments are short compared withthe dimensions of the cell. The filaments are supposed to be placed indepen-dently by a Poisson process in the plane and hence their variance contributionscan be summed in a cell to give the variance for zonal averages—that is thebetween cell variance for complete sampling schemes. So, the variance betweencells is the expectation of the covariance function, taken over all possible pairsof points in the cell, as given in (6.5). We re-write this for square cells of sidelength x as

V ar(x) = V ar(0)∫

√2x

0

a(r) b(r, x) dr (6.14)

Using this equation, in Figure 6.2 we plot V ar(x)/V ar(0) against inspectioncell size xh−1Mpc for the case of filaments with width ω = 5h−1Mpc andlength λ = 100h−1Mpc. Note that V ar(x) is expressible also as an integral ofthe (point) power spectrum over wavelength interval [x,∞) and that Landyet al. [133] detected evidence of a strong peak at 100h−1Mpc in the powerspectrum of the Las Campanas Redshift Survey, cf. also Lin et al. [136].

Page 134: Information Geometry: Near Randomness and Near Independence

124 6 Cosmological Voids and Galactic Clustering

10 20 30 40 50 60 70

0.1

0.2

0.3

0.4V ar(x)

V ar(0)

x h−1 Mpc

Fig. 6.2. Relative between cell variance for a planar Poisson process of filamentswith width ω = 5 h−1Mpc and length λ = 100 h−1Mpc for complete sampling usingsquare cells of side length x h−1Mpc from equation (6.14).

These spatial statistical models may be used in two distinct ways. If ob-servational data is available for V ar(x) for a range of x values, for exampleby digitizing catalogue data on 2-dimensional slices, then attempts may bemade to find the best fit for λ and ω. That would give statistical estimates offilament sizes on the presumption that the underlying process of filaments isPoisson. On the other hand, given observed V arobs(x) for a range of x, thevariance ratio of this to (6.14)

V R(x) =V arobs(x)V ar(x)

(6.15)

will be an increasing function of x if there is a tendency of the filaments tocluster.

According to the Las Campanas Redshift Survey, some 40 percent of galax-ies out to 500h−1Mpc are contained in filaments and the remainder in ‘sheets’,which we may interpret perhaps as rectangles and squares, respectively, bothapparently following a Poisson process. Such a composite spatial structuremay be represented easily with our model, if the individual Poisson processesare independent; then the net variance for any choice of inspection cells is theweighted sum of the variances for the individual processes. So the between cellvariance (6.14) becomes a weighted sum of integrals, using the appropriate afunctions for the constituent representative lumps of matter—perhaps squaresfor sheets and two kinds of rectangles for filaments, dense and light.

Page 135: Information Geometry: Near Randomness and Near Independence

6.3 Cosmological Voids 125

6.3 Cosmological Voids

A number of recent studies have estimated the inter-galactic void probabil-ity function and investigated its departure from various statistical models.We study a family of parametric statistical models based on gamma distri-butions, which do give realistic descriptions for other stochastic porous me-dia. Gamma distributions, §1.4.1, contain as a special case the exponentialdistributions, §1.2.2, which correspond to the ‘random’ void size probabilityarising from Poisson processes, §1.1.3. The space of parameters is a surfacewith a natural Riemannian metric structure, §2.0.5, §3.1. This surface con-tains the Poisson processes as an isometric embedding, §2.0.5, and our recenttheorem [14] cf. §5.1, shows that it contains neighbourhoods of all departuresfrom Poisson randomness. The method provides thereby a geometric settingfor quantifying such departures and on which may be formulated cosmolog-ical evolutionary dynamics for galactic clustering and for the concomitantdevelopment of the void size distribution.

Several years ago, the second author presented an information geometricapproach to modelling a space of perturbations of the Poisson random statefor galactic clustering and cosmological void statistics [62]. Here we updatesomewhat and draw attention to this approach as a possible contribution tothe interpretation of the data from the 2-degree field Galaxy Redshift Survey(2dFGRS), cf Croton et al. [49, 50]. The new 2dFGRS data offers the possi-bility of more detailed investigation of this approach than was possible whenit was originally suggested [62, 63, 64] and some parameter estimations aregiven.

The classical random model is that arising from a Poisson process (cf.§1.1.3) of mean density N galaxies per unit volume in a large box. Then, in aregion of volume V, the probability of finding exactly m galaxies is given byequation (6.1). So the probability that the given region is devoid of galaxies isP0 = e−NV . It follows that the probability density function for the continuousrandom variable V in the Poisson case is

prandom(V ) = N e−NV (6.16)

In practice of course, measurements will depend on algorithms that specifythreshold values for density ranges of galaxies in cells and the lowest range willrepresent the ‘underdense’ regions which include the voids; Benson et al. [22]discuss this.

A hierarchy of N -point correlation functions needed to represent clusteringof galaxies in a complete sense was devised by White [210] and he providedexplicit formulae, including their continuous limit. In particular, he made adetailed study of the probability that a sphere of radius R is empty and showedthat formally it is symmetrically dependent on the whole hierarchy of corre-lation functions. However, White concentrated his applications on the casewhen the underlying galaxy distribution was a Poisson process, the starting

Page 136: Information Geometry: Near Randomness and Near Independence

126 6 Cosmological Voids and Galactic Clustering

point for the present approach which is concerned with geometrizing the pa-rameter space of departures from a Poisson process. Croton et al. [50] foundthat the negative binomial model for galaxy clustering gave a very good ap-proximation to the 2dFGRS, pointing out that this model is a discrete versionof the gamma distribution, §1.4.1.

6.4 Modelling Statistics of Cosmological Void Sizes

For a general account of large-scale structures in the universe, see Fairall [82].Kauffmann and Fairall [114] developed a catalogue search algorithm for largernearly spherical regions devoid of bright galaxies and obtained a spectrum forradii of significant voids. This indicated a peak radius near 4 h−1Mpc, a longtail stretching at least to 32 h−1Mpc, and is compatible with the recent ex-trapolation models of Baccigalupi et al [18] which yield an upper bound onvoid radii of about 50 h−1Mpc. This data has of course omitted the expectedvery large numerical contribution of smaller voids. More recent work, notablyof Croton et al. [50] provide much larger samples with improved estimates ofvoid size statistics and Benson et al. [22] gave a theoretical analysis in antici-pation of the 2dFGRS survey data, including the evaluation of the void andunderdense probability functions. Hoyle and Vogeley [104] provided detailedresults for the statistics of voids larger than 10 h−1Mpc in the 2dFGRS surveydata; they concluded that such voids constitute some 40% of the universe andhave a mean radius of about 15 h−1Mpc.

The count density N(V ) of galaxies observed in zones using a range ofsampling schemes each with a fixed zone volume V results in a decreasingvariance V ar(N(V )) of count density with increasing zone size, roughly ofthe form

V ar(N(V )) ≈ V (0) e−V/Vk as V → 0 (6.17)

where Vk is some characteristic scaling parameter. This monotonic decay ofvariance with zone size is a natural consequence of the monotonic decay of thecovariance function, roughly isotropically and of the form

Cov(r) ≈ e−r/rk as r → 0 (6.18)

where rk is some characteristic scaling parameter of the order of magnitudeof the diameter of filament structures; this was discussed in [63]. Then

V ar(N(V )) ≈∫ ∞

0

Cov(r) b(r) dr (6.19)

where b(r) is the probability density of finding two points separated by dis-tance r independently and at random in a zone of volume V. The powerspectrum using, say, cubical cells of side lengths R is given by the family ofintegrals

Page 137: Information Geometry: Near Randomness and Near Independence

6.4 Modelling Statistics of Cosmological Void Sizes 127

Pow(N(R)) ≈∫ ∞

R

Cov(r) b(r) dr. (6.20)

Fairall [82] (page 124) reported a value σ2 = 0.25 for the ratio of varianceV ar(N(1)) to mean squared N

2for counts of galaxies in cubical cells of unit

side length. In other words, the coefficient of variation, §1.2, for sampling withcells of unit volume is

cv(N(1)) =

V ar(N(1))N

= 0.5 (6.21)

and this is dimensionless.We choose a family of parametric statistical models for void volumes that

includes the Poisson random model (6.16) as a special case. There are of coursemany such families, but we take one that our recent theorem [14], cf. §5.1, hasshown contains neighbourhoods of all departures from Poisson randomnessand it has been successful in modelling void size distributions in terrestrialstochastic porous media with similar departures from randomness [72, 73].Also, the complementary logarithmic version has been used in the represen-tation of clustering of galaxies [63, 64].

The family of gamma distributions has event space Ω = R+, parameters

µ, α ∈ R+ and probability density functions given by, Figure 6.3,

f(V ;µ, α) =(

α

µ

)αV α−1

Γ (α)e−V α/µ (6.22)

0.25 0.5 0.75 1 1.25 1.5 1.75 2

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

f(V; 1, α)

α = 12

α = 1

α = 5

α = 2

Void volume V

Fig. 6.3. Gamma probability density functions, f(V ; µ, α), from (6.22) representingthe inter-galactic void volumes V with unit mean µ = 1, and α = 1

2, 1, 2, 5. The

case α = 1 corresponds to the ‘random’ case from an underlying Poisson process ofgalaxies; α < 1 corresponds to clustering and α > 1 corresponds to dispersion.

Page 138: Information Geometry: Near Randomness and Near Independence

128 6 Cosmological Voids and Galactic Clustering

Then V = µ and V ar(V ) = µ2/α and we see that µ controls the mean of thedistribution while the spread and shape is controlled by 1/α, the square ofthe coefficient of variation, §1.2.

The special case α = 1 corresponds to the situation when V representsthe Poisson process in (6.16). The family of gamma distributions can modela range of statistical processes corresponding to non-independent ‘clumped’events, for α < 1, and dispersed events, for α > 1, as well as the Poisson ran-dom case α = 1 (cf. [14, 72, 73]). Thus, if we think of this range of processesas corresponding to the possible distributions of centroids of extended ob-jects such as galaxies that are initially distributed according to a Poissonprocess with α = 1, then the three possibilities are:

Chaotic—Poisson random structure: no interactions among constitu-ents, α = 1;

Clustered structure: mutually attractive type interactions, α < 1;Dispersed structure: mutually repulsive type interactions, α > 1.

For our gamma-based void model we consider the radius R of a sphericalvoid with volume V = 4

3πR3 having distribution (6.22). Then the probability

density function for R is given by

p(R;µ, α) =4πR2

Γ (α)

(

α

µ

)α (4πR3

3

)α−1

e−4πR3α

3µ (6.23)

The mean R, variance V ar(R) and coefficient of variation cv(R) of R aregiven, respectively, by

R =(

3µ4πα

) 13 Γ (α + 1

3 )Γ (α)

(6.24)

V ar(R) =(

3µ4πα

) 23 Γ (α) Γ (α + 2

3 ) − Γ (α + 13 )2

Γ (α)2(6.25)

cv(R) =

V ar(R)R

=

Γ (α) Γ(

α + 23

)

Γ(

α + 13

) − 1 (6.26)

The fact that the coefficient of variation (6.26), §1.2, depends only on αgives a rapid parameter fitting of data to the probability density function forvoid radii (6.23). Numerical fitting to (6.26) gives α; this substituted in (6.24)yields an estimate of µ to fit a given observational mean.

However, there is a complication: necessarily in order to have a physicallymeaningful definition for voids, observational measurements introduce a min-imum threshold size for voids. For example, Hoyle and Vogeley [104] usedalgorithms to obtain statistics on 2dFGRS voids with radius R > Rmin =10 h−1Mpc; for voids above this threshold they found their mean size is about

Page 139: Information Geometry: Near Randomness and Near Independence

6.4 Modelling Statistics of Cosmological Void Sizes 129

0

20000

40000

600000

0.25

0.5

0.75

1

00.20.40.60.8

µ

PR>10

α

Fig. 6.4. Probability that a void will have radius R > 10 h−1Mpc as a function ofparameters µ, α from equation (6.27). The range α < 1 corresponds to clusteringregimes. The plane at level PR>10 = 0.4 corresponds to the fraction 40% of theuniverse filled by voids, as reported by Hoyle and Vogeley [104].

15 h−1Mpc with a variance of about 8.1. This of course is not directly com-parable with the above distribution for R in equation (6.23) since the latterhas domain R > 0. Now, from (6.23), the probability that a void has radiusR > A is, Figure 6.4,

PA =Γ (α, 4A3πα

3µ )

Γ (α)(6.27)

whereΓ (α,A) =

∫ ∞

A

tα−1 e−t dt

is the incomplete gamma function, with Γ (α) = Γ (α, 0).Hence the mean, variance and coefficient of variation, §1.2, for the void

distribution with R > A become:

R>A =(

3µ4πα

) 13 Γ (α + 1

3 ,4A3πα

3µ )

Γ (α, 4A3πα3µ )

(6.28)

V ar(R>A) =(

3µ4πα

) 23 Γ (α) Γ (α + 2

3 ,4A3πα

3µ ) − Γ (α + 13 ,

4A3πα3µ )2

Γ (α, 4A3πα3µ ) Γ (α)

(6.29)

Page 140: Information Geometry: Near Randomness and Near Independence

130 6 Cosmological Voids and Galactic Clustering

cv(R>A) =

V ar(R>A)R>A

=

Γ (α) Γ (α + 23 ,

4A3πα3µ )

Γ (α + 13 ,

4A3πα3µ )2

− 1 (6.30)

Summarizing from Hoyle and Vogeley [104]:A = 10 h−1Mpc, PA ≈ 0.4, R>A ≈ 15 h−1Mpc, V ar(R>A) ≈ 8.1 (h−1Mpc)2

so cv(R>A) ≈ 0.19.

6.5 Coupling Galaxy Clustering and Void Sizes

Next we follow the methodology introduced in [63, 64] to provide a simplemodel that links the number counts in cells and the void probability functionand which contains perturbations of the Poisson random case. This exploitsthe central role of the gamma distribution (6.22) in providing neighbour-hoods of Poisson randomness [14], cf. §5.1 that contain all maximum likeli-hood nearby perturbations of the Poisson case and it allows a direct use of thelinked information geometries for the coupled processes of voids and galaxies.

Observationally, in a region where there are found large voids we wouldexpect to find lower local density of galaxies, and vice versa. Then the tworandom variables, local void volume V and local density of galaxies N, arepresumably inversely related. Many choices are possible; we take a simple func-tional form using an exponential and normalize the local density of galaxiesto be bounded above by 1. Denoting the random variable representing thisnormalized local density by N, we put:

N(V ) = e−V (6.31)

This model was explored in [63, 64] and it is easy to show that the probabilitydensity function for N is given by the log gamma distribution from equation(3.30) with ν = α

µ and τ = α :

g(N ;µ, α) =(

α

µ

)αNα/µ−1

Γ (α)| logN |1−α (6.32)

This distribution (cf. Figure 3.3) for local galactic number density has meanN,variance V ar(N) and coefficient of variation, §1.2, cv(N) =

V ar(N)/Ngiven by

N =(

α

α + µ

(6.33)

V ar(N) =(

α

α + 2µ

−(

α

α + µ

)2α

(6.34)

cv(N) =

V ar(N)N

=

(

α

α + µ

)−2α(α

α + 2µ

− 1. (6.35)

Page 141: Information Geometry: Near Randomness and Near Independence

6.5 Coupling Galaxy Clustering and Void Sizes 131

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5

g(N; µ , α)

α = 5

α = 2

α = 1

α = 12

Normalized local density of galaxies N

Fig. 6.5. Log-gamma probability density functions g(N ; µ, α), from (6.32) repre-senting the normalized local density of galaxies, N ∈ (0, 1], with central mean N = 1

2,

and α = 12, 1, 2, 5. The case α = 1 is the uniform distribution. The cases α < 1

correspond to clustering in the underlying spatial process of galaxies, so there areprobability density peaks of high and low density; conversely, α > 1 corresponds todispersion.

Figure 6.5 shows the distribution (6.32) for mean normalized densityN = 1

2 and α = 12 , 1, 2, 5. Note that as α → 1 so the distribution (6.32)

tends to the uniform distribution. For α < 1 we have clustering in the under-lying process, with the result that the population has high and low densitypeaks. Other choices of functional relationship between local void volume andlocal density of galaxies would lead to different distributions; for example,N(V ) = e−V k

for k = 2, 3, . . . , would serve. However, the persisting qual-itative observational feature that would discriminate among the parametersis the prominence of a central modal value—indicating a smoothed or dis-persed structure, or the prominence of high and low density peaks—indicatingclustering.

Using the reported value cv(N(1)) = 0.5 from Fairall [82] (page 124) forcubical volumes with side length R = 1 h−1Mpc, the curve so defined inthe parameter space for the log gamma distributions (6.32), has maximumclustering for α ≈ 0.6, µ ≈ 0.72.

From the 2-degree field Galaxy Redshift Survey (2dFGRS), Crotonet al. [49] in their Figure 2 reported the decay of normalised variance,ξ2 = cv(N(R))2 with scale radius R and the associated departure fromPoisson randomness. in the form

χ = − log10 P0(R)N

,

Page 142: Information Geometry: Near Randomness and Near Independence

132 6 Cosmological Voids and Galactic Clustering

0

10

200

0.25

0.5

0.75

1

0123

30

µ

cv(N)

α

Fig. 6.6. Coefficient of variation of counts in cells N for log gamma distributionequation (6.32). The range α < 1 corresponds to clustering regimes. The threeplanes show the levels cv(N) = 1,

√6,

√10 as reported by Fairall [82] and Croton

et al. [49, 50].

where P0(R) is the probability of finding zero galaxies in a spherical region ofradius R when the mean number is N. From Figure 2 in [49] we see that, forthe data of the Volume Limited Catalogue with magnitude range −20 to −21and N(1) = 1.46 : cv(N(1))2 ≈ 6 and χ ≈ 0.9 at R ≈ 1 also cv(N(7))2 ≈ 1and χ ≈ 0.4 at R ≈ 7.

Croton et al. [50] in Table 1 reported N values for cubical volumes withside length R = 1 h−1Mpc in the range 0.11 ≤ N ≤ 11. From Figure 3 inthat paper we see that, at the scale R = 1 h−1Mpc, log10 ξ2 ≈ 1 which givesa coefficient of variation cv(N(1)) ≈

√10.

The above-mentioned observations cv(N) = 1,√

6,√

10 are shown as hor-izontal planes in Figure 6.6, a plot of the coefficient of variation, §1.2, forthe number counts in cells from the log gamma family of distributions equa-tion (6.32). The range α < 1 corresponds to clustering regimes.

6.6 Representation of Cosmic Evolution

Theoretical models for the evolution of galactic clustering and consequentialvoid statistics can be represented crudely as curves on the gamma manifoldof parameters with the information metric (3.19). So we have a means of

Page 143: Information Geometry: Near Randomness and Near Independence

6.6 Representation of Cosmic Evolution 133

interpreting the parameter changes with time through an evolving processsubordinate to the log-gamma distribution (6.32). The coupling with the voidprobability function controlled by the gamma distribution (6.22) allows thecorresponding void evolution to be represented. It is of course very unlikelythat this simple model is realistic in all respects but it may provide a conve-nient model with qualitatively stable properties that are realistic. Moreover,given a different family of distributions the necessary information geometrycan be computed for the representation of evolutionary processes.

The entropy for the gamma probability density function (6.22) was given inequation (1.57) and it was shown in Figure 1.4. At unit mean, the maximumentropy (or maximum uncertainty) occurs at α = 1, which is the Poissonrandom case, and then Sf (µ, 1) = 1 + logµ.

The ‘maximum likelihood’ estimates µ, α of µ, α can be expressed in termsof the mean and mean logarithm of a set of independent observations X =X1,X2, . . . , Xn. These estimates are obtained in terms of the propertiesof X by maximizing the ‘log-likelihood’ function

lX(µ, α) = log likX(µ, α) = log

(

n∏

i=1

p(Xi;µ, α)

)

with the following result

µ = X =1n

n∑

i=1

Xi (6.36)

log α− ψ(α) = logX − log X (6.37)

where logX = 1n

∑ni=1 logXi and ψ(α) = Γ ′(α)

Γ (α) is the digamma function, thelogarithmic derivative of the gamma function

The Riemannian information metric on the 2-dimensional parameter space

G = (µ, α) ∈ R+ × R

+

has arc length function given by

ds2 =α

µ2dµ2 +

(

ψ′(α) − 1α

)

dα2 for µ, α ∈ R+. (6.38)

Moreover, as we have seen above in Proposition 3.9, the manifold of log-gammadensity functions has the same information metric as the gamma manifold.

The 1-dimensional subspace parametrized by α = 1 corresponds to theavailable Poisson processes. A path through the parameter space G of gammamodels determines a curve

c : [a, b] → G : t → (c1(t), c2(t)) (6.39)

with tangent vector c(t) = (c1(t), c2(t)) and norm ||c|| given via (6.38) by

Page 144: Information Geometry: Near Randomness and Near Independence

134 6 Cosmological Voids and Galactic Clustering

0.4 0.6 0.8 1.2

1.4

1.6

1.8

2.2

2.4

2.6

µ

α

0.4 0.6 0.8 1.2

0.6

0.8

1.2

1.4

µ

α

0.4 0.6 0.8 1.2

0.2

0.4

0.6

0.8

1

µ

α

Fig. 6.7. Geodesic sprays radiating from the points with unit mean µ = 1, andα = 0.5, 1, 2. The case α = 1 corresponds to an exponential distribution froman underlying Poisson process of galaxies; α < 1 corresponds to clustering and αincreasing above 1 corresponds to dispersion and greater uniformity.

Page 145: Information Geometry: Near Randomness and Near Independence

6.6 Representation of Cosmic Evolution 135

||c(t)||2 =c2(t)c1(t)2

c1(t)2 +(

ψ′(c2(t)) −1

c2(t)

)

c2(t)2. (6.40)

The information length of the curve is

Lc(a, b) =∫ b

a

||c(t)|| dt (6.41)

and the curve corresponding to an underlying Poisson process has c(t) = (t, 1),so t = µ and α = 1 = constant, and the information length is log b

a .

Presumably, at early times the universe was dense, so N → 1 hence µ wassmall, and essentially chaotic and Poisson so α → 1; in the current epoch weobserve clustering of galaxies so α < 1. What is missing is some cosmologicalphysics to prescribe the dynamics of progress from early times to the present.In the absence of such physical insight, there is a distinguished candidatetrajectory from an information geometric viewpoint: that of a geodesic, satis-fying the condition (2.7). Some examples of sprays of geodesics are drawn inFigure 6.7 from numerical solutions to the equation ∇cc = 0.

In Figure 6.8 we show two sets of three geodesics, passing through(µ, α) = (1, 1) and through (µ, α) = (0.5, 1) respectively; for each set wehave maximally extended the geodesics in the parameter space R

+ × R+.

In Figure 6.9 the horizontal geodesic through (µ, α) = (0.1, 1) begins in aPoisson random state A at high mean density, N ≈ 1 from equation (6.33),and evolves with decreasing α to lower mean density B through increasingly

0.5 1.0 1.5 2.0 2.5 3.0

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

α

µ

Fig. 6.8. Examples of geodesics passing through the points (µ, α) = (1, 1) and(µ, α) = (0.5, 1) then maximally extended to the edge of the available space.

Page 146: Information Geometry: Near Randomness and Near Independence

136 6 Cosmological Voids and Galactic Clustering

0.2 0.4 0.6 0.8 1.0 1.20.0

0.2

0.4

0.6

0.8

1.0A

B

µ

α

Fig. 6.9. A maximally extended horizontal geodesic through (µ, α) = (0.1, 1) beginsat A with high mean density N ≈ 1 in a Poisson state and evolves to lower meandensity through increasingly clustered states until B where α ≈ 0.6, after which Nincreases again along the geodesic, as shown in Figure 6.10.

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0A

B

N

α

Fig. 6.10. Plot of the mean density N along the horizontal geodesic through (µ, α) =(0.1, 1) in Figure 6.9. It begins at high mean density in a Poisson state A and evolvesto lower mean density through increasingly clustered states until B where α ≈ 0.6,after which N increases again along the geodesic.

Page 147: Information Geometry: Near Randomness and Near Independence

6.6 Representation of Cosmic Evolution 137

N =(

αα+µ

Fig. 6.11. Mean density N of counts in cells from equation (6.33) for log gammadistribution (6.32).

clustered states. Being horizontal as it passes through (µ, α) = (0.1, 1) meansthat it is directed towards future Poisson random states; but we see that infact it curves down towards clustered states as a result of the informationgeometric curvature.

Figure 6.10 shows an approximate numerical plot of the mean density Nalong the horizontal geodesic through (µ, α) = (0.1, 1) in Figure 6.9. TheFigure 6.11 shows a surface plot of N as a function of (µ, α).

Page 148: Information Geometry: Near Randomness and Near Independence
Page 149: Information Geometry: Near Randomness and Near Independence

7

Amino Acid ClusteringWith A.J. Doig

In molecular biology a fundamental problem is that of relating functionaleffects to structural features of the arrangement of amino acids in proteinchains. Clearly, there are some features that have localized deterministic originfrom the geometrical organization of the helices; other features seem to be ofa more stochastic character with a degree of stability persisting over longsequences that approximates to stationarity. These latter features were thesubject of our recent study [34], which we outline in this chapter. We makeuse of gamma distributions to model the spacings between occurrences of eachamino acid; this is an approximation because the molecular process is of coursediscrete. However, the long protein chains and the large amount of data leadus to believe that the approximation is justified, particularly in light of theclear qualitative features of our results.

7.1 Spacings of Amino Acids

We analysed for each of the 20 amino acids X the statistics of spacings betweenconsecutive occurrences of X within the Saccharomyces cerevisiae genome,which has been well characterised elsewhere [95]. These occurrences of aminoacids may exhibit near Poisson random, clustered or smoothed out behav-iour, like 1-dimensional spatial statistical processes along the protein chain.If amino acids are distributed independently and with uniform probabilitywithin a sequence then they follow a Poisson process and a histogram of thenumber of observations of each gap size would asymptotically follow a nega-tive exponential distribution. The question that arises then is how 20 differentapproximately Poisson processes constrained in finite intervals be arrangedalong a protein. We used differential geometric methods to quantify informa-tion on sequencing structures of amino acids and groups of amino acids, viathe sequences of intervals between their occurrences. The differential geometryarises from the information-theoretic distance function on the 2-dimensional

K. Arwini, C.T.J. Dodson, Information Geometry. 139Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 150: Information Geometry: Near Randomness and Near Independence

140 7 Amino Acid Clustering

Table 7.1. Experimental data from over 3 million amino acid occurrences in se-quences of length up to n = 6294 for protein chains of the Saccharomyces cerevisiaegenome. Also shown are relative abundance, mean spacing, variance, and maximumlikelihood gamma parameter for each amino acid, fitting the interval distributiondata to equation (7.3). The grand mean relative abundance was pi ≈ 0.05 and thegrand mean interval separation was µi ≈ 18.

Amino Acid i Occurrences Abundance pi Mean separation µi Variance σ2i αi

A Alanine 163376 0.055 17 374 0.81C Cysteine 38955 0.013 55 5103 0.59D Aspartate 172519 0.058 16 346 0.77E Glutamate 192841 0.065 15 292 0.73F Phenylalanine 133737 0.045 21 554 0.78G Glycine 147416 0.049 19 487 0.74H Histidine 64993 0.022 39 1948 0.79I Isoleucine 195690 0.066 15 222 0.95K Lysine 217315 0.073 14 240 0.74L Leucine 284652 0.095 10 122 0.85M Methionine 62144 0.021 46 2461 0.85N Asparagine 182314 0.061 16 277 0.87P Proline 130844 0.044 21 587 0.77Q Glutamine 116976 0.039 24 691 0.81R Arginine 132789 0.045 21 565 0.78S Serine 269987 0.091 11 136 0.85T Threonine 176558 0.059 16 307 0.85V Valine 166092 0.056 17 315 0.93W Tryptophan 31058 0.010 62 6594 0.58Y Tyrosine 100748 0.034 27 897 0.79

space processes subordinate to gamma distributions—which latter include thePoisson random process as a special case.

Table 7.1 summarizes some 3 million experimentally observed occurrencesof the 20 different amino acids within the Saccharomyces cerevisiae genomefrom the analysis of 6294 protein chains with sequence lengths up to n =4092. Listed also for each amino acid are the relative abundances pi andmean separation µi; the grand mean relative abundance was p ≈ 0.05 andthe grand mean interval separation was µi ≈ 17. We found that maximum-likelihood estimates of parametric statistics showed that all 20 amino acidstend to cluster, some substantially. In other words, the frequencies of short gaplengths tends to be higher and the variance of the gap lengths is greater thanexpected by chance. Our information geometric approach allowed quantitativecomparisons to be made. The results contribute to the characterisation ofwhole amino acid sequences by extracting and quantifying stable statisticalfeatures; further information may be found in [34] and references therein.

Page 151: Information Geometry: Near Randomness and Near Independence

7.2 Poisson Spaced Sequences 141

7.2 Poisson Spaced Sequences

First we consider a simple reference model by computing the statistics of theseparation between consecutive occurrences of each amino acid X along aprotein chain for a Poisson random disposition of the amino acids.

For example, in the sequence fragment AKLMAATWPFDA, for aminoacid A (denoting Ala or Alanine) there are gaps of 1, 4 and 6 since the succes-sive Ala residues are spaced i,i+4, i,i+1 and i,i+6, respectively. In the randomcase, of haphazard allocation of events along a line, the result is an exponen-tial distribution of inter-event gaps when the line is infinite. For finite lengthprocesses it is more involved and we analyse this first in order to provide ourreference structure.

Consider a protein chain simply as a sequence of amino acids among whichwe distinguish one, represented by the letter X, while all others are representedby ?. The relative abundance of X is given by the probability p that an arbi-trarily chosen location has an occurrence of X. Then (1−p) is the probabilitythat the location contains a different amino acid from X. All locations areoccupied by some amino acid. If the X locations are chosen with uniformprobability subject to the constraint that the net density of X in the chainis p, then either X happens or it does not; so we have a binomial process.

Then in a sequence of n amino acids, the mean or expected number ofoccurrences of X is np and its variance is np(1 − p), but the distributionof lengths of spaces between consecutive occurrences of X is less clear. Thedistribution of such lengths r, measured in units of one location length also iscontrolled by the underlying binomial distribution.

We seek the probability of finding in a sequence of n amino acids a subse-quence of form

· · ·?X︷ ︸︸ ︷

? · · ·?X? · · ·︸ ︷︷ ︸

,

where the overbrace ︷︸︸︷ encompasses precisely r amino acids that are notX and the underbrace

︸︷︷︸

encompasses precisely n amino acids, the wholesequence.

In a sequence of n locations filled by amino acids we seek the probabilityof finding a subsequence containing two X’s separated by exactly r non-X?’s, that is the occurrence of an inter-X space length r.

The probability distribution function P (r, p, n) for inter-X space length rreduces to the first expression below (7.1), which is a geometric distributionand simplifies to (7.2)

P (r, p, n) =

(

p2(1 − p)r(n− r − 2))

∑n−2r=0 (p2(1 − p)r(n− r − 2))

, (7.1)

=(1 − p)1+r

p2 (n− r − 2)−1 + (1 − p)n + p (n + p− n p)

, (7.2)

for r = 0, 1, . . . , (n− 2).

Page 152: Information Geometry: Near Randomness and Near Independence

142 7 Amino Acid Clustering

0.02

0.04

0.06

0.08

0.1

P (r, 0 .1, 1000), r = 8. 9, σ2r = 70.3

P (r, 0.05, 1000), r = 18.6, σ2r = 377.0

P (r, 0.01, 1000), r = 88.0, σ2r = 478.9

Inter-X interval r

Fig. 7.1. Sample probability distributions P (r, p, n) from (7.2) of interval length rbetween occurrences of amino acid X shown for 0 ≤ r ≤ 100 in Poisson randomsequences of length n = 1000.

Three sample distributions are shown in Figure 7.1, for a sequence ofn = 1000 amino acids in which X has mean abundance probability valuesp = 0.01, 0.05, 0.10. The mean r and standard deviation σr of the distribution(7.2) are known analytically for r = 0, 1, . . . , (n−2) and their expressions maybe found in [34]. As n → ∞, r → 1

p − 1 and α = r2

σ2r→ (1 − p).

The main variables of interest are: the number n of amino acids in thesequence, and the relative abundance probability p of occurrence of X foreach amino acid X. Their effects on the statistics of the distribution of inter-vals between consecutive occurrences of X are illustrated in Figure 7.2 andFigure 7.3, respectively. As might be expected for a Poisson random process,the standard deviation is approximately equal to the mean. For we know thatthe Poisson distribution is a good approximation to the binomial distributionwhen n is large and p is small. In the case of a Poisson process along a line thedistribution of interval lengths is exponential with standard deviation equalto the mean.

7.3 Non-Poisson Sequences as Gamma Processes

Here we model the spacings between occurrences of amino acids (viewed as arenewal process of occurrences [150]) by supposing that the spacing distribu-tion is of gamma type; this model includes the Poisson case and perturbationsof that case.

Page 153: Information Geometry: Near Randomness and Near Independence

7.3 Non-Poisson Sequences as Gamma Processes 143

20 40 60 80 100

20

40

60

80

100σr Standard deviation

p = 0.01

p = 0.05

p = 0.1

n Increasing length

Mean inter-X interval r

Fig. 7.2. Finite Poisson random sequences. Effect of sequence length 10 ≤ n ≤ 4000in steps of 10 on standard deviation σr versus mean r for inter-X interval distrib-utions (7.2) with abundance probabilities p = 0.01, 0.05 and 0.1, corresponding tothe cases in Figure 7.1. The standard deviation is roughly equal to the mean; meanand standard deviation increase monotonically with increasing n.

The family of gamma distributions, §1.4.1, with event space R+, parame-

ters µ, α ∈ R+ has probability density functions given by

f(t;µ, α) =(

α

µ

)αtα−1

Γ (α)e−tα/µ. (7.3)

Then µ is the mean and V ar(t) = µ2/α is the variance, so the coefficient ofvariation, §1.2,

V ar(t)/µ = 1/√α is independent of the mean. As mentioned

before, this latter property characterizes gamma distributions as shown re-cently by Hwang and Hu [106] (cf. their concluding remark). For independentpositive random variables x1, x2, . . . , xn with a common continuous probabil-ity density function h, that having independence of the sample mean x andsample coefficient of variation cv = S/x is equivalent to h being a gammadistribution.

The special case α = 1 corresponds to the situation of the random orPoisson process with mean inter-event interval µ. In fact, for integer α =1, 2, . . . , equation (7.3) models a process that is Poisson but with (α − 1)

Page 154: Information Geometry: Near Randomness and Near Independence

144 7 Amino Acid Clustering

10 20 30 40 50

10

20

30

40

50σr Standard deviation

p Increasing abundance

Mean inter-X interval r

Fig. 7.3. Finite Poisson random sequences. Effect of relative abundance probability0.01 ≤ p ≤ 0.1 in steps of 0.01 on standard deviation σr versus mean r for inter-Xinterval in Poisson random sequences of length n = 100 (lower points) and lengthn = 1000 (upper points). The standard deviation is roughly equal to the mean; bothdecrease monotonically with increasing p.

intermediate events removed to leave only every αth. Gamma distributionscan model a range of statistical processes corresponding to non-independentclustered events, for α < 1, and dispersed or smoothed events, for α > 1,as well as the Poisson random case α = 1. Figure 7.4 shows sample gammadistributions, all of unit mean, representing clustering, Poisson and dispersedspacing distributions, respectively, with α = 0.4, 0.6, 0.8, 1, 1.2, 2.

From §3.4, the Riemannian information metric on the parameter spaceG = (µ, α) ∈ R

+ ×R+ for the gamma distributions (7.3) is given by the arc

length function from equation (3.19)

ds2G =α

µ2dµ2 +

(

ψ′(α) − 1α

)

dα2 for µ, α ∈ R+, (7.4)

where ψ(α) = Γ ′(α)Γ (α) is the logarithmic derivative of the gamma function. The

1-dimensional subspace parametrized by α = 1 corresponds to all possible‘random’ (Poisson) processes, or equivalently, exponential distributions.

Page 155: Information Geometry: Near Randomness and Near Independence

7.3 Non-Poisson Sequences as Gamma Processes 145

0.0 0.5 1.0 1.5 2.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

f(t; 1, α)

α < 1 (Clustered)

α = 1 (Random)

α = 1.2 (Dispersed)

α = 2 (Dispersed)

Inter-event interval t

Fig. 7.4. Gamma probability density functions, f(t; µ, α), (7.3) for inter-eventintervals t with unit mean µ = 1, and descending from the top left α =0.4, 0.6, 0.8, 1, 1.2, 2. The case α = 1 corresponds to randomness via an expo-nential distribution from an underlying Poisson process; α = 1 represents somenon-Poisson—clustering or dispersion.

7.3.1 Local Geodesic Distance Approximations

A path through the parameter space G of gamma models determines a curve,parametrized by t in some interval a ≤ t ≤ b, given by

c : [a, b] → G : t → (c1(t), c2(t)) (7.5)

and its tangent vector c(t) = (c1(t), c2(t)) has norm ||c|| given via (7.4) by

||c(t)||2 =c2(t)c1(t)2

c1(t)2 +(

ψ′(c2(t)) −1

c2(t)

)

c2(t)2 (7.6)

and the information length of the curve is

Lc(a, b) =∫ b

a

||c(t)|| dt for a ≤ b. (7.7)

For example, the curve c(t) = (t, 1), which passes through processes witht = µ and α = 1 = constant, has information length log b

a . Locally, mini-mal paths in G are given by the geodesics [70] from equation (2.7) using theLevi-Civita connection ∇ (2.5) induced by the Riemannian metric (7.4).

In a neighbourhood of a given point we can obtain a locally bilinear ap-proximation to distances in the space of gamma models. From (7.4) for smallvariations ∆µ,∆α, near (µ0, α0) ∈ G; it is approximated by

Page 156: Information Geometry: Near Randomness and Near Independence

146 7 Amino Acid Clustering

∆sG ≈√

α0

µ20

∆µ2 +(

ψ′(α0) −1α0

)

∆α2 . (7.8)

As α0 increases from 1, the factor (ψ′(α0) − 1α0

) decreases monotonically

from π2

6 − 1. So, in the information metric, the difference ∆µ has increasingprominence over ∆α as the standard deviation reduces with increasing α0—corresponding to increased spatial smoothing of occurrences.

In particular, near the exponential distribution, where (µ0, α0) = (1, 1),(7.8) is approximated by

∆sG ≈√

∆µ2 +(

π2

6− 1)

∆α2 . (7.9)

For a practical implementation we need to obtain rapid estimates ofdistances in larger regions than can be represented by quadratics in incre-mental coordinates. This can be achieved using the result of Dodson andMatsuzoe [68] that established geodesic foliations for the gamma manifold.Now, a geodesic curve is locally minimal and so a network of two non-parallelsets of geodesics provides a mesh of upper bounds on distances by using the tri-angle inequality about any point. Distance bounds using such a geodesic meshare shown in Figure 7.8 using the geodesic curves µ = α and α = constant,which foliate G [68].

Explicitly, the arc length along the geodesic curves µ = α from (µ0, α0) to(µ = α, α) is

d2 logΓdα2

(α) − d2 logΓdα2

(α0)∣

and the distance along curves of constant α = α0 from (µ0, α0) to (µ, α0) is∣

α0 logµ0

µ

.

In Figure 7.8 we use the base point (µ0, α0) = (18, 1) ∈ G and combine theabove two arc lengths of the geodesics to obtain an upper bound on distancesfrom (µ0, α0) as

Distance[(µ0, α0), (µ, α)] ≤∣

d2 logΓdα2

(α) − d2 logΓdα2

(α0)∣

+∣

α0 logµ0

µ

.

(7.10)

The gamma distribution fitted the experimental data quite well andFigure 7.7 shows the histogram for the first 30 data points for all 20 aminoacids, and their residuals. Figure 7.5 shows that the expected values for thegamma parameter α would exceed 0.97 in Poisson random sequences whereasall 20 amino acids had maximum likelihood fits of the gamma parameter inthe range 0.59 ≤ α ≤ 0.95.

Page 157: Information Geometry: Near Randomness and Near Independence

7.3 Non-Poisson Sequences as Gamma Processes 147

200 400 600 800 10000.8

1

1.2

1.4

1.6

1.8Gamma parameter α =

(

rσr

)2

p = 0.01

p = 0.05

p = 0.1

n

Fig. 7.5. Finite Poisson random sequences. Effect of sequence length n on gamma

parameter α =(

rσr

)2

for relative abundances p = 0.01, 0.05, 0.1. The 20 maximum

likelihood fits of the gamma parameter had 0.59 ≤ α ≤ 0.95.

20 40 60 80 1000.8

1

1.2

1.4

1.6

1.8Gamma parameter α =

(

rσr

)2

→ n Increasing length

Mean inter-X interval r

p = 0.01

p = 0.05

p = 0.1

Fig. 7.6. Finite Poisson random sequences. Effect of sequence length 10 ≤ n ≤ 4000in steps of 10 on gamma parameter α from (7.3) versus mean r for inter-X intervaldistributions (7.2). The mean probabilities for the occurrences of X are p = 0.1 (left),p = 0.05 (centre) and p = 0.01 (right), corresponding to the cases in Figures 7.1and 7.2.

Page 158: Information Geometry: Near Randomness and Near Independence

148 7 Amino Acid Clustering

0

5 104

1 105

1.5 105

2 105

2.5 105

3 105

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Observed DataBest Fit to Gamma Distribution

Freq

uenc

y

Gap Size

−5 104

−4 104

−3 104

−2 104

−1 104

0

1 104

2 104

Fig. 7.7. Histograms of the first 30 data points for all 20 amino acids and residualscompared to maximum likelihood gamma distribution fitting.

7.4 Results

Table 7.1 gives the values of the total number, mean, variance and α for eachamino acid. There is a large variation in mean gap size µ, ranging from 10.75(Ser) to 61.82 (Trp) with an overall average of 25. This is attributed largely toamino acid frequency; the gap to the next amino acids will tend to be smallerif the amino acid is more abundant. Similarly, rare amino acids, such as Cys,His, Met and Trp will be more widely spaced. There is a therefore a negativecorrelation between mean gap size and amino acid frequency. Clustering isrevealed by the gamma distribution analysis. In all cases, we see that α < 1;hence every amino acid tends to cluster with itself. There is some variation,with Cys and Trp most clustered and Ile and Val very close to a Poissonrandom process in their spatial statistics.

Figure 7.8 shows a plot of all 20 amino acids as points on a surface over thespace of (µ, α) values; the height of the surface represents the information-

Page 159: Information Geometry: Near Randomness and Near Independence

7.4 Results 149

0

20

40

60

0.60.8

1.01.2

Distance from⊗

µ

α

Fig. 7.8. Distances in the space of gamma models, using a geodesic mesh. Thesurface height represents upper bounds on distances from the grand mean point(µ, α) = (18, 1), the Poisson random case with mean µ = 18, marked with

.Depicted also are the 20 data points for the amino acid sequences from Table 7.1.All amino acids show clustering to differing degrees by lying to the left of the Poissonrandom line α = 1, some substantially so.

theoretic distance using (7.10) from the point marked as⊗

, this point isthe case of Poisson randomly distributed amino acids with mean µ = 18 andα = 1.

If an exponential distribution gave the maximum likelihood fit then itwould yield α ≈ 1. This is arguably within experimental tolerance for I,Nand V, but unlikely in the other cases which have maximum likelihood gammaparameter α ≤ 0.85. However, we find no case of α > 0.97 and the analytic re-sults for the case of finite Poisson random sequences did not yield α > 0.95 in

Page 160: Information Geometry: Near Randomness and Near Independence

150 7 Amino Acid Clustering

the regime of interest. Thus, we conclude that our methods therefore reveal animportant qualitative property: universal self-clustering for these amino acids,stable over long sequences. Moreover, the information-theoretic geometry al-lows us to provide quantitative measurements of departures from Poisson ran-domness, as illustrated graphically in Figure 7.8; such depictions of the spaceof gap distributions could prove useful in the representation of trajectories forevolutionary or other structurally modifying processes.

For comparison with Figure 7.8, Figure 7.9 shows 20 data points from sim-ulations of Poisson random amino acid sequences of length n = 10000 for anamino acid with abundance probability p = 0.05, using the Mathematica [215]

18

19

20

21

22

0.60.8

1.01.2

0

1

2

3Distance from

µ

α

Fig. 7.9. Distances in the space of gamma models, using a geodesic mesh. Thesurface height represents upper bounds on distances from the nominal target point,(µ, α) = (20, 1), for 20 data points from simulations of Poisson random amino acidsequences of length n = 10, 000 for an amino acid with abundance probability p =0.05. Whereas the observations of real sequences, Figure 7.8, showed variation 0.59 ≤α ≤ 0.95, for these Poisson random simulations we find 0.95 ≤ α ≤ 1.23.

Page 161: Information Geometry: Near Randomness and Near Independence

7.5 Why Would Amino Acids Cluster? 151

pseudorandom number generator. Whereas the observations of real sequences,Figure 7.8 showed variation 0.59 ≤ α ≤ 0.95, we find for Poisson random sim-ulations that 0.95 ≤ α ≤ 1.23.

7.5 Why Would Amino Acids Cluster?

Clustering could arise from secondary structure preferences. α-Helices are typ-ically 4-15 and β-strands 2-8 amino acids in length [163]. For any secondarystructural element to form, most amino acids within its sequence must have ahigh propensity for that structure. Identical amino acids will therefore clusterover these length ranges as this will favour a sequence with a high preferencefor forming one particular secondary structure. For example, Ala has a highpreference for the α-helix. Hence evolution will select sequences where Ala-nines are clustered in order to favour α-helix formation. If amino acids werePoisson distributed, the probability that a stretch of amino acids would con-tain a high preference for a secondary structural element would be decreased.A second possibility is that amino acids of similar hydrophobicity cluster inorder to produce a hydrophobic membrane spanning the sequence or waterexposed polar loop.

Some deterministic effects arise from the preferred spatial configurations,as visible in Figure 7.7. Gap sizes of 1 or 2 are disfavoured, 1 strongly so.The only exceptions to this are Gln and Ser, which strongly favour short gapsof 1, 2 or 3. Poly(Gln) sequences give a high frequency of gaps of 1 and area well known feature of a number of proteins, implicated in several diseases,including Huntington’s disease [115]. Gaps of 3-12 are generally favoured,perhaps because this is the usual length of secondary structure. There arealso local preferences for gaps of 4 and 7 that can be attributed to α-helices.Side chains spaced i,i+4 and i,i+7 are on the same side of an α-helix so canbond to one another. Sequences are favoured that have identical side chainsclose in space in the α-helix. In particular, a favoured gap of 7 for Leu canbe attributed to coiled-coils that are characterised by pairs of α-helices heldtogether by hydrophobic faces with Leu spaced i,i+7 [132], [137], [160].

Clearly, the maximum likelihood gamma distributions fit only sttisticalfeatures and in that respect view the data as exhibiting transient behaviourat small gap sizes—we recall from Table 7.1 that the overall mean intervalis about 18—other methods are available for interpretation of deterministicfeatures. We concentrate here on the whole sequences by extracting and quan-tifying stable statistical features and we find that all 20 amino acids tend toself-cluster in protein sequences.

Page 162: Information Geometry: Near Randomness and Near Independence
Page 163: Information Geometry: Near Randomness and Near Independence

8

Cryptographic Attacks and Signal Clustering

Typical public-key encryption methods involve variations on the RSA proce-dure devised by Rivest, Shamir and Adleman [174]. This employs modulararithmetic with a very large modulus in the following manner. We compute

R ≡ ye (modm) or R ≡ yd (modm) (8.1)

depending respectively on whether we are encoding or decoding a message y.The (very large) modulus m and the encryption key e are made public; thedecryption key d is kept private. The modulus m is chosen to be the productof two large prime numbers p, q which are also kept secret and we choose d, esuch that

ed ≡ 1 (mod (p− 1)(q − 1)). (8.2)

8.1 Cryptographic Attacks

It is evident that both encoding and decoding will involve repeated exponen-tiation procedures. Then, some knowledge of the design of an implementationand information on the timing or power consumption during the various stagescould yield clues to the decryption key d. Canvel and Dodson [38, 37] haveshown how timing analyses of the modular exponentiation algorithm quicklyreveal the private key, regardless of its length. In principle, an incorporationof obscuring procedures could mask the timing information but that may notbe straightforward for some devices. Nevertheless, it is important to be ableto assess departures from Poisson randomness of underlying or overlying pro-cedures that are inherent in devices used for encryption or decryption andhere we outline some information geometric methods to add to the standardtests [179].

In a review, Kocher et al. [119] showed the effectiveness of DifferentialPower Analysis (DPA) in breaking encryption procedures using correlationsbetween power consumption and data bit values during processing, claiming

K. Arwini, C.T.J. Dodson, Information Geometry. 153Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 164: Information Geometry: Near Randomness and Near Independence

154 8 Cryptographic Attacks and Signal Clustering

that most smartcards reveal their keys using fewer than 15 power traces.Power consumption information can be extracted from even noisy recordingsusing inductive probes external to the device.

Chari et al. [41] provided a probabilistic encoding (secret sharing) schemefor effectively secure computation. They obtained lower bounds on the numberof power traces needed to distinguish distributions statistically, under certainassumptions about Gaussian noise functions. DPA attacks depend on the as-sumption that power consumption in a given clock cycle will have a distributiondepending on the initial state; the attacker needs to distinguish between differ-ent ‘nearby’ distributions in the presence of noise. Zero-Knowledge proofs allowverification of secret-based actions without revealing the secrets. Goldreichet al. [94] discussed the class of promise problems in which interaction may giveadditional information in the context of Statistical Zero-Knowlege (SZK). Theyinvoked two types of difference between distributions: the ‘statistical difference’and the ‘entropy difference’ of two random variables. In this context, typically,one of the distributions is the uniform distribution.

Thus, in the contexts of DPA and SZK tests, it is necessary to comparetwo nearby distributions on bounded domains. This involves discriminationbetween noisy samples drawn from pairs of closely similar distributions. Insome cases the distributions resemble truncated Gaussians; sometimes onedistribution is uniform. Dodson and Thompson [77] have shown that infor-mation geometry can help in evaluating devices by providing a metric on asuitable space of distributions.

8.2 Information Geometry of the Log-gamma Manifold

The log-gamma family of probability density functions §3.6 provides a 2-dimensional metric space of distributions with compact support on [0, 1], rang-ing from the uniform distribution to symmetric unimodular distributions ofarbitrarily small variance, as may be seen in Figure 3.3 and Figure 3.4.

Information geometry provided the metric for a discrimination procedurereported by Dodson and Thompson [77] exploiting the geometry of the man-ifold of log-gamma distributions, which we have seen above has these usefulproperties:• it contains the uniform distribution• it contains approximations to truncated Gaussian distributions• as a Riemannian 2-manifold it is an isometric isomorph of the manifold ofgamma distributions.

The log-gamma probability density functions discussed in § 3.6 for randomvariable N ∈ (0, 1] were given in equation (3.38), Figure 8.1,

g(N ; γ, τ) =1

Γ (τ)

(

τ

γ

Nτγ −1

(

log1N

)τ−1

for γ > 0 and τ > 0 . (8.3)

These coordinates (γ, τ) are actually orthogonal for the Fisher informationmetric on the parameter space L = (γ, τ) ∈ (0,∞) × (0,∞). Its arc length

Page 165: Information Geometry: Near Randomness and Near Independence

8.3 Distinguishing Nearby Unimodular Distributions 155

Fig. 8.1. Mean value N =(

ττ+γ

as a surface with a horizontal section at the

central value N = 12, which intersects the N surface in the curve γ = τ(21/τ − 1).

function is given from equation (3.39) by

ds2 =∑

ij

gij dxidxj =

τ

γ2dγ2 +

(

d2

dτ2log(Γ ) − 1

τ

)

dτ2. (8.4)

In fact, (8.3) arises from the gamma family, §1.4.1,

f(x, γ, τ) =xτ−1 ( τ

γ )τ

Γ (τ)e−

x τγ (8.5)

for the non-negative random variable x = log 1N with mean x = γ. It is known

that the gamma family (8.5) has also the information metric (8.4) so theidentity map on the space of coordinates (γ, τ) is not only a diffeomorphismbut also an isometry of Riemannian manifolds.

8.3 Distinguishing Nearby Unimodular Distributions

Log-gamma examples of unimodular distributions resembling truncatedGaussians are shown on the right of Figure 8.3. Such kinds of distributionscan arise in practical situations for bounded random variables. A measure of

Page 166: Information Geometry: Near Randomness and Near Independence

156 8 Cryptographic Attacks and Signal Clustering

information distance between nearby distributions is obtained from (8.4) forsmall variations ∆γ,∆τ, near (γ0, τ0) ∈ L; it is approximated by

∆sL ≈√

τ0γ20

∆γ2 +(

d2

dτ2log(Γ )|τ0 −

1τ0

)

∆τ2 . (8.6)

Note that, as τ0 increases from 1, the factor in brackets in the second part ofthe sum under the square root decreases monotonically from π2

6 − 1. So, inthe information metric, the difference ∆γ has increasing prominence over ∆τas the standard deviation (cf. Figure 8.2) reduces with increasing τ0, as wesee in the Table.

τ0

(

d2

dτ2 log(Γ )|τ0 − 1τ0

)

cvN (τ0)†

1 0.6449340 0.5773502 0.1449340 0.4432583 0.0616007 0.3733224 0.0338230 0.3286385 0.0213230 0.2969316 0.0146563 0.2729307 0.0106880 0.2539468 0.0081370 0.2384429 0.0064009 0.225472

10 0.0051663 0.214411†At N = 1

2

Fig. 8.2. Coefficient of variation cvN = σN

Nfor the log-gamma distribution as a

smooth surface with a hatched surface at the central mean case N = 12.

Page 167: Information Geometry: Near Randomness and Near Independence

8.5 Gamma Distribution Neighbourhoods of Randomness 157

0 0.2 0.4 0.6 0.8 1

0.20.40.60.81

1.21.4

N

0 0.2 0.4 0.6 0.8 1

1

2

3

4

N

Fig. 8.3. Examples from the log-gamma family of probability densities with centralmean N = 1

2. Left: τ = 1, 1.2, 1.4, 1.6, 1.8. Right: τ = 4, 6, 8, 10.

For example,some data on power measurements from a smartcard leakinginformation during processing of a ‘0’ and a ‘1’, at a specific point in processtime, yielded two data sets C, D. These had maximum likelihood parameters(γC = 0.7246, τC = 1.816) and (γD = 0.3881, τD = 1.757). We see thathere the dominant parameter in the information metric is γ. In terms of theunderlying gamma distribution, from which the log-gamma is obtained, γ isthe mean.

8.4 Difference From a Uniform Distribution

The situation near to the uniform distribution τ = 1 is shown on the leftin Figure 8.3. In this case we have (γ0, τ0) = (1, 1) and for nearby distribu-tions, (8.6) is approximated by

∆sL ≈√

∆γ2 +(

π2

6− 1)

∆τ2 . (8.7)

We see from (8.7) that, in the information metric, ∆τ is given about 80% ofthe weight of ∆γ, near the uniform distribution.

The information-theoretic metric and these approximations may be animprovement on the areal-difference comparator used in some recent SZKstudies [57, 94] and as an alternative in testing security of devices like smart-cards.

8.5 Gamma Distribution Neighbourhoodsof Randomness

In a variety of contexts in cryptology for encoding, decoding or for obscur-ing procedures, sequences of pseudorandom numbers are generated. Tests forrandomness of such sequences have been studied extensively and the NIST

Page 168: Information Geometry: Near Randomness and Near Independence

158 8 Cryptographic Attacks and Signal Clustering

0 100 200 300 400 5000.6

0.8

1.0

1.2

1.4

1.6

τ

Fig. 8.4. Maximum likelihood gamma parameter τ fitted to separation statisticsfor simulations of Poisson random sequences of length 100000 for an element withexpected parameters (γ, τ) = (511, 1). These simulations used the pseudorandomnumber generator in Mathematica [215].

Suite of tests [179] for cryptological purposes is widely employed. Informa-tion theoretic methods also are used, for example see Grzegorzewski andWieczorkowski [101] also Ryabko and Monarev [180] and references therein forrecent work. Here we can show how pseudorandom sequences may be testedusing information geometry by using distances in the gamma manifold tocompare maximum likelihood parameters for separation statistics of sequenceelements.

Mathematica [215] simulations were made of Poisson random sequenceswith length n = 100000 and spacing statistics were computed for an elementwith abundance probability p = 0.00195 in the sequence. Figure 8.4 showsmaximum likelihood gamma parameter τ data points from such simulations.In the data from 500 simulations the ranges of maximum likelihood gammadistribution parameters were 419 ≤ γ ≤ 643 and 0.62 ≤ τ ≤ 1.56.

The surface height in Figure 8.5 represents upper bounds on informationgeometric distances from (γ, τ) = (511, 1) in the gamma manifold. This em-ploys the geodesic mesh function we developed in the previous Chapter (7.10)

Distance[(511, 1), (γ, τ)] ≤∣

d2 logΓdτ2

(τ) − d2 logΓdτ2

(1)∣

+∣

log511γ

. (8.8)

Also shown in Figure 8.5 are data points from the Mathematica simulationsof Poisson random sequences of length 100000 for an element with expectedseparation γ = 511.

Page 169: Information Geometry: Near Randomness and Near Independence

8.5 Gamma Distribution Neighbourhoods of Randomness 159

Fig. 8.5. Distances in the space of gamma models, using a geodesic mesh. Thesurface height represents upper bounds on distances from (γ, τ) = (511, 1) fromEquation (8.8). Also shown are data points from simulations of Poisson randomsequences of length 100000 for an element with expected separation γ = 511. In thelimit as the sequence length tends to infinity and the element abundance tends tozero we expect the gamma parameter τ to tend to 1.

In the limit, as the sequence length tends to infinity and the abundanceof the element tends to zero, we expect the gamma parameter τ to tendto 1. However, finite sequences must be used in real applications and thenprovision of a metric structure allows us, for example, to compare real sequencegenerating procedures against an ideal Poisson random model.

Page 170: Information Geometry: Near Randomness and Near Independence
Page 171: Information Geometry: Near Randomness and Near Independence

9

Stochastic Fibre NetworksWith W.W. Sampson

There is considerable interest in the materials science community in the struc-ture of stochastic fibrous materials and the influence of structure on theirmechanical, optical and transport properties. We have common experienceof such materials in the form of paper, filters, insulating layers and support-ing matrices for composites. The reference model for such stochastic fibrenetworks is the 2-dimensional array of line segments with centres following aPoisson process in the plane and axis orientations following a uniform process;that structure is commonly called a random fibre network and we study thisbefore considering departures from it.

9.1 Random Fibre Networks

Micrographs of four stochastic fibrous materials are shown in Figure 9.1. Thecarbon fibre network on the top left of Figure 9.1 is used in fuel cell appli-cations and provides the backbone of the electromagnetic shielding used instealth aerospace technologies; the glass fibre network on the top right is ofthe type used in laboratory and industrial filtration applications; the networkon the bottom right is a sample of paper formed from softwood fibres; on theleft of the bottom row is an electrospun nylon nanofibrous network. Exam-ples of the latter type are the focus of worldwide research activity since suchmaterials have great potential for application as cell culture scaffolds in tissueengineering, see e.g. [172, 165, 32]. Although the micrographs in Figure 9.1are manifestly different from each other, it is equally evident that they exhibitstrikingly similar structural characteristics.

A classical reference structure for modelling is an isotropic planar networkof infinite random lines. So the angles of lines relative to a given fixed directionare uniformly distributed and on each line the locations of the intersectionswith other lines in the network form a Poisson point process. A graphical rep-resentation of part of an infinite line network is shown on the left of Figure 9.2;the graphic on the right of this figure shows a network having the same total

K. Arwini, C.T.J. Dodson, Information Geometry. 161Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 172: Information Geometry: Near Randomness and Near Independence

162 9 Stochastic Fibre Networks

Fig. 9.1. Micrographs of four stochastic fibrous materials. Top left: Nonwoven car-bon fibre mat; Top right: glass fibre filter; Bottom left: electrospun nylon nanofibrousnetwork (Courtesy S.J. Eichhorn and D.J. Scurr); Bottom right: paper.

Fig. 9.2. Graphical representations of planar random networks of lines with infiniteand finite length; both have the same total length of lines.

Page 173: Information Geometry: Near Randomness and Near Independence

9.1 Random Fibre Networks 163

line length per unit area but made from lines of finite length. We discuss net-works of these types in detail in the sequel, but for now we observe qualitativesimilarity among the networks in the above Figures 9.1,9.2, particularly in thesizes and shapes of the polygons enclosed by the intersections of local groupsof lines.

The polygons generated by the intersections of lines have been studiedby many workers and several analytic results are known. There are results ofMiles [147, 148] and Tanner [198] (cf. also Stoyan et al. [196]) for random linesin a plane, for example:

• Expected number of sides per polygon

n = 4.

• Variance of the number of sides per polygon

σ2(n) =π2 + 24

2.

• Perimeter P of polygons with n sides has a χ2 distribution with 2(n− 2)degrees of freedom and probability density function q given by

q(P, n) =Pn−3 e−P/2

2n−2 Γ (n− 2), n = 3, 4, . . . (9.1)

where P is given as a multiple of the mean polygon side length and thecase of n = 3 for perimeter of triangles coincides with an exponentialdistribution.

• Probability of triangles

p3 = (2 − π2

6) ≈ 0.355

• Probability of quadrilaterals

p4 =13− 7π2

36+ 4

∫ π/2

0

x2 cotx dx ≈ 0.381.

Stoyan et al. [196] p325 collect further results from Monte Carlo methods:

p5 ≈ 0.192, p6 ≈ 0.059, p7 ≈ 0.013, p8 ≈ 0.002 (9.2)

and mention the empirical approximation obtained by Crain and Miles for thedistribution of the number of sides per random polygon

pn ≈ e−1

(n− 3)!. (9.3)

In Dodson and Sampson [75] we develop this a little more by providingnew analytic approximations to the distributions of the areas and local linedensities for random polygons and we compute various limiting properties ofrandom polygons.

Page 174: Information Geometry: Near Randomness and Near Independence

164 9 Stochastic Fibre Networks

9.2 Random Networks of Rectangular Fibres

Most real fibrous materials consist of fibres of finite width and length withdistributed morphologies. However, we note also the important result ofMiles [147] that the distribution of the diameters of inscribed circles is expo-nential and unaltered by changing the infinite random lines to infinite randomrectangles for arbitrary distributions of width. This characteristic arises fromthe fact that as the lines change to rectangles with increasing width, so thearea of polygons decreases and some small polygons disappear; accordingly,we expect the same independence of width for the distances between adjacentfibre crossings on a given line and in the polygon area distribution.

Although the modelling of fibres as lines of infinite length is convenientwhen considering the statistics of the void structure and porous properties ofstochastic fibrous materials, there is an influence of fibre length on networkuniformity and other properties that must be considered. For such properties,our reference structure for modelling is a two dimensional random fibre net-work where fibres are considered to be rectangles of given aspect ratio withtheir centroids distributed according to a planar Poisson point process andthe orientation of their major axes to any fixed direction are uniformly dis-tributed; such a network is represented graphically on the right of Figure 9.2.As an example for a familiar material, the typical range of aspect ratios fornatural cellulose fibres in paper is from about 20 to perhaps 100, as may beseen by tearing the edge of a a sheet of writing paper. Figure 9.3 shows arealdensity radiographs for three wood fibre networks made with the same meanareal density but with different spatial distributions of fibres, from Oba [156].

In the case of fibres of finite width, we call the number of fibres coveringa point in the plane the coverage, c. The coverage is distributed according toa Poisson distribution so has probability function

P (c) =cc e−c

c!where c = 0, 1, 2, . . . (9.4)

and c is the expected coverage.Referring to Figure 9.2, we observe that if we partitioned the network into

square zones of side length d, then the total fibre area in each zone and hencethe local average coverage c there would vary from zone to zone. For a randomnetwork of fibres of uniform length λ and uniform width ω, the variance ofthe local average coverage, σ2

d(c) for such random fibre networks was derivedby Dodson [58]:

σ2d(c) = (c− c)2 = c

√2 d

0

a(r, ω, λ) b(r, d) dr. (9.5)

Here a(r, ω, λ) is the point autocorrelation function for coverage at points sep-arated by a distance r and b(r, d) is the probability density function for thedistance r between two points chosen independently and at random within

Page 175: Information Geometry: Near Randomness and Near Independence

9.2 Random Networks of Rectangular Fibres 165

Fig. 9.3. Density maps of coverage for three wood fibre networks with constant meancoverage, c ≈ 20 fibres, but different distributions of fibres. Each image representsa square region of side length 5 cm; darker regions correspond to higher coverage.Top: cv1mm = 0.08; centre: cv1mm = 0.11; bottom: cv1mm = 0.15. The top image issimilar to that expected for a Poisson process of the same fibres.

Page 176: Information Geometry: Near Randomness and Near Independence

166 9 Stochastic Fibre Networks

the a zone. The point autocorrelation function was derived for arbitrary rec-tangular zones [58] and for square zones of side length d, it is given by,

a(r, ω, λ) =

1 − 2π

(

rλ + r

ω − r2

2 ω λ

)

for 0 < r ≤ ω

(

arcsin(

ωr

)

− ω2 λ − r

ω +√

r2

ω2 − 1)

for ω < r ≤ λ

(

arcsin(

ωr

)

− arccos(

λr

)

− ω2 λ − λ

2 ω

− r2

2 λ ω +√

r2

λ2 − 1 +√

r2

ω2 − 1)

for λ < r ≤√

λ2 + ω2

0 for r >√

λ2 + ω2

(9.6)Also, Ghosh [91] had provided

b(r, d) =

4 rd4

(

π d2

2 − 2 r d + r2

2

)

for 0 ≤ r ≤ d

4 rd4

(

d2(

arcsin(

dr

)

− arccos(

dr

))

+2 d√r2 − d2 − 1

2

(

r2 + 2d2))

for d ≤ r ≤√

2d

0 for r >√

2d.

(9.7)

The integral term in equation (9.5) is the fractional between zones varianceand is plotted in Figure 9.4. We observe that it increases with increasing fibrelength and width, and decreases with increasing zone size. The actual distri-bution of local zonal averages of coverage for a random network of rectangleswould by the Central Limit Theorem be a (truncated) Gaussian, being theresult of a large number of independent Poisson events.

Knowing the fractional between zones variance allows us to compare themeasured distribution of mass of a real fibre network with that of a randomfibre network formed from the same constituent fibres. There is a large archive

0.01 0.1 1 10

10−2

10−3

10−4

10−5

10−6

Fibre length, λ (mm)

x = 1 mm

x = 100 µm

x = 10 µm

Fra

ctio

nal be

twee

n-zo

nes

varian

ce

ω = 20 nmω = 50 nm

ω = 100 nm

Fig. 9.4. Dependence of the fractional between zones variance on fibre length λ,fibre width ω and side length x of square inspection zones, equation (9.5).

Page 177: Information Geometry: Near Randomness and Near Independence

9.2 Random Networks of Rectangular Fibres 167

of such analyses from radiographic imaging, particularly for paper—arguablythe most common and familiar stochastic fibrous material used in societyand industry. Such data reveals that industrially formed networks invariablyexhibit a higher variance of local coverage, at all scales above a few fibrewidths, than the corresponding random structure [56, 182, 155].

In characterizing fibre networks, the random case is taken as a well-definedreference structure and then we proceed to consider how the effects of fibreorientation and fibre clumping combine to yield structures with non-randomfeatures—clustering of fibre centres and preferential orientation of fibre axes.These departures from randomness represent in practical applications thecomponent of variability within the structure which has the potential to beinfluenced through intervention and control in manufacturing processes. Inlater sections we model the effects of non-randomness in fibre networks on thedistribution of in-plane pore dimensions, which is important for fluid transferproperties. We deal first with the case of clustering of fibres in an isotropicnetwork, then proceed to the case of clustering and anisotropy.

Typically, the coefficient of variation, §1.2, of local areal density at the onemillimetre scale for papers varies in the range 0.02 < cv(c)1mm < 0.20, thelower end corresponding to very fine papers and the upper end correspondingto papers with very clumpy, contrasty appearance when viewed in transmittedlight. Distributions of such types are shown in Figure 9.5 with A correspondingto cv(c)1mm ≈ 0.02 and B corresponding to cv(c)1mm ≈ 0.20.

0.20.4

0.60.8

1

25

50

75

100

02.55

7.510

0.20.4

0.60.8

α

A

B

p(c1mm)

c1mm

Fig. 9.5. The log-gamma family of probability densities (9.10) for c1mm ∈ (0, 1],representing the local areal density at the 1mm scale observed in paper, normalizedwith central mean c = 1

2. Here, cv(c)1mm ranges from about 0.02 at A to about

0.20 at B; about half way between these two points corresponds to a structure withareal density map like the central image in Figure 9.3.

Page 178: Information Geometry: Near Randomness and Near Independence

168 9 Stochastic Fibre Networks

A larger variance than for the random case for the same fibres correspondsto a higher degree of fibre clumping or clustering (also called ‘flocculation’by paper scientists) in real networks than in random networks. This arisesfrom non-independence in the deposition of fibres through interactions duringmanufacturing. In continuous manufacturing processes of papermaking type,including non-woven textiles, this is largely because fibres are flexible andcan entangle during mixing in aqueous suspensions from which the network ismade by forced filtration. These interactions cause departure from the Poissonassumption that the locations of fibre centres are independent of each other.

9.3 Log-Gamma Information Geometryfor Fibre Clustering

Using the log-gamma densities as approximations to the truncated Gaussiansfrom Proposition 3.9, we have approximate probability density functions forc given by

p(c;µ, α) ≈ 1Γ (α)

(

1c

)1−αµ(

α

µ

logα−1

(

1c

)

. (9.8)

Figure 9.5 shows the log-gamma probability density functions(9.8) with cen-tral mean from (9.10) over a range which approximates the probability densityfunctions for the range of values 0.02 < cv(c)1mm < 0.20. Hence, in the regimeof practical interest for paper-like networks, normalising at fixed mean cov-erage c = 1

2 , the parameters for the appropriate log-gamma densities haveµ ≈ 0.7 and 50 < α < 120, as shown in Figure 9.6—cf. equation (9.11) be-low. Figure 9.3 shows areal density radiographs for three wood fibre networksmade with the same mean coverage and from the same fibres but with differ-ent spatial distributions of fibres, from Oba [156]; an electron micrograph ofthe surfaces would resemble that shown at the bottom right in Figure 9.1. InFigure 9.3 the actual mean number of fibres covering a point was about 20,corresponding to a writing paper grade and the fibres were on average 2 mmlong and 38 µm wide. The radiograph at the top resembles that expected fora random network, that is, a Poisson process of fibres.

Normalizing (9.8) to a central mean c = 12 , we have

µ =(

21α − 1

)

α, (9.9)

so (9.8) reduces to

p(c;(

21α − 1

)

α, α) ≈ 1Γ (α)

(

1−1 + 2

)α(1c

)1+ 1

1−21α logα−1

(

1c

)

. (9.10)

Page 179: Information Geometry: Near Randomness and Near Independence

9.4 Bivariate Gamma Distributions for Anisotropy 169

0 50 100 150 200

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

cv(c)1mm

α

Fig. 9.6. Coefficient of variation cv(c)1mm when c = 12

for the log-gamma densityfunctions (9.10) as shown in Figure 9.5. A random structure would have µ ≈ 0.7 andcv(c)1mm ≈ 0.07, with areal density map rather like that at the top of Figure 9.3.

We observe that from (9.9)

limα→∞

(

21α − 1

)

α = log 2 ≈ 0.7. (9.11)

Figure 9.7 gives a surface plot of cv(c)1mm for the regime of practical interest,with the curve (9.9) passing through the points having c = 1

2 . Essentially, thiscurve represents the range of fibre network structures that can typically bemanufactured by a forced filtration process. A strong brown bag paper couldcorrespond to a point towards B with areal density map rather like that atthe bottom of Figure 9.3. A very fine glass fibre filter could correspond toa point towards A and a random structure would be between these having(µ, α) ≈ (0.7, 100) and cv(c)1mm ≈ 0.07, with areal density map rather likethat at the top of Figure 9.3.

Some geodesics in the log-gamma manifold are shown in Figure 9.8 pass-ing through (µ, α) = (0.7, 50), and in Figure 9.9 passing through the point(µ, α) = (0.7, 100). Both sets of geodesics have initial directions around theα = constant direction.

9.4 Bivariate Gamma Distributions for Anisotropy

Intuitively, the degree of clumping in the network will influence also the dis-tribution of inter-crossing distances in the network and hence the polygonsize distribution. Most processes for the manufacture of stochastic fibrous

Page 180: Information Geometry: Near Randomness and Near Independence

170 9 Stochastic Fibre Networks

α

0.2

cv(c)1mm

0

0

µ

1

B

A

Fig. 9.7. Coefficient of variation of the log-gamma family of probability densities(9.8) approximating the range of distributions of local areal density at the 1mm scaleobserved for paper. The curve in the surface is given by (9.9) and passes throughpoints having central mean c = 1

2and for cv(c)1mm ranging from about 0.02 at A to

about 0.20 at B. A random structure would be between these with (µ, α) ≈ (0.7, 100)and cv(c)1mm ≈ 0.07.

materials are continuous and yield a web suitable for reeling. Accordingly,the processes tend to impart some directionality to the structure since fibresexhibit a preferential orientation along the direction of manufacture [185, 45].Several probability densities have been used to model the fibre orientation dis-tribution including von Mises [139] and wrapped-Cauchy distributions [185].However, in practice a simple one-parameter cosine distribution is sufficient torepresent the orientation distribution for most industrially formed networks;this has probability density

f(θ) =1π− ν cos(2 θ) (9.12)

where 0 ≤ ν ≤ 1π is a free parameter controlling the extent of orientation, such

that when ν = 0, θ has a uniform distribution. Equation (9.12) is plotted inFigure 9.10 for most of the applicable range of ν; for most machine madepapers 0.1 ≤ ν ≤ 0.2.

Page 181: Information Geometry: Near Randomness and Near Independence

9.5 Independent Polygon Sides 171

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

46

48

50

52

54

α

µ

Fig. 9.8. Examples of geodesics in the log-gamma manifold passing through thepoint (µ, α) = (0.7, 50) with initial directions around the α = constant direction.

9.5 Independent Polygon Sides

For a two-dimensional random network of lines the distribution of inter-crossing distances, g can be considered as the distribution of intervals be-tween Poisson events on a line and so has an exponential distribution withprobability density,

f(g) =1ge−g/g, (9.13)

with coefficient of variation, §1.2, cv(g) = σg/g = 1.We expect that the effect of clumping over and above that observed for

a random process will be to increase the variance of these inter-crossing dis-tances without significantly affecting the mean. Conversely, we might expectpreferential orientation of lines to reduce the number of crossings betweenlines and hence increase the mean inter-crossing distance and to increase ordecrease the variance. A convenient distribution to characterise the inter-crossing distances in such near-random networks is the gamma distributionas suggested by Deng and Dodson [56].

Page 182: Information Geometry: Near Randomness and Near Independence

172 9 Stochastic Fibre Networks

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

96

98

100

102

104

α

µ

Fig. 9.9. Examples of geodesics in the log-gamma manifold passing through (µ, α) =(0.7, 100) with initial directions around the α = constant direction.

ν = 0 ν = 0.1 ν = 0.2 ν = 0.3

Dir

ecti

on

of m

anu

fact

ure

Fig. 9.10. Probability densities for fibre orientation according to the 1-parametercosine distribution as given by equation (9.12).

Page 183: Information Geometry: Near Randomness and Near Independence

9.5 Independent Polygon Sides 173

The gamma distribution, §1.4.1, has probability density

f(g) =(

α

µ

)αgα−1

Γ (α)e−α g/µ (9.14)

with mean g = µ and coefficient of variation, §1.2, cv(g) = 1/√α. When α = 1

we recover the probability density function for the exponential distribution asgiven by equation (9.13).

The inter-crossing distances in the network generate the perimeters of thepolygonal voids and, from the work of Miles [147], the expected number ofsides per polygon is four. This led Corte and Lloyd [48] to model the poly-gon area distribution for a random line network as a system of rectangularpores with sides given by independent and identical exponential distributions.Interestingly, that model for the void structure of a random fibre network pre-dated by three years the solution in 1968 [58] of the corresponding problemfor the matter distribution; so, for paper-like materials, there was an ana-lytic model for the statistics of where matter was not, before there was onefor where matter was. Some thirty years later, we extended the treatmentgiven by Corte and Lloyd and derived the probability density of pore areafor rectangular pores with sides given by independent and identical gammadistributions [71, 72], where the free parameters of the gamma distributionare assumed to represent the influence of fibre clumping and fibre orientation.

Once again, the uniqueness property of the gamma distribution thatwas proved by Hwang and Hu [106] (cf. their concluding remark) given asTheorem 1.1 above is relevant here. Now, it is commonly found in experimen-tal measurements of pore size distributions for papers and non-woven textilesmade under different conditions that the standard deviation is proportionalto the mean. This encourages the view that the gamma distribution is an ap-propriate generalization of the exponential distribution for adjacent polygonside lengths in near-random fibre networks. Miles [147] had proved that forplanar random lines the distribution of the diameters of inscribed circles inpolygons is exponential and unaltered by changing the infinite random lines toinfinite random rectangles for arbitrary distributions of width. This suggeststhat gamma distributions may well be appropriate to model the inscribedcircles in non-random cases.

We seek the probability density for the areas, a of rectangles with sides gx

and gy such that a = gx gy and the probability densities of gx and gy are givenby equation (9.14). The probability density of a is given by

g(a) =∫ ∞

0

f(gx) f(gy)dgx (9.15)

=∫ ∞

0

1gx

f(gx) f(a/gx)dgx (9.16)

=2

Γ (α)2aα−1 α2α µ−2α K0 (ζ) where ζ =

2αµ

√a (9.17)

and K0(ζ) is the zeroth order modified Bessel function of the second kind.

Page 184: Information Geometry: Near Randomness and Near Independence

174 9 Stochastic Fibre Networks

The mean, variance and coefficient of variation, §1.2, of rectangle area are

a = µ2 (9.18)

σ2(a) =1 + 2αα2

µ4 (9.19)

cv(a) =√

1 + 2αα

(9.20)

respectively. Then it follows that we can solve for α in terms of cv(a) :

α =1 ±

1 + cv(a)2

cv(a)2(9.21)

so we take the positive root here and note that we recover the random case,α = 1, precisely if cv(a) =

√3.

Experimental analyses of porous materials using mercury porosimetry orfluid displacement porometry typically infer pore sizes and their distributionsin terms of an equivalent pore diameter. Such measures are convenient as theyprovide an accessible measure of, for example, the sizes of particles that mightbe captured by a fibrous filter. Following Corte and Lloyd [48], we define theequivalent radius rp of a rectangular pore as the radius of a circle with thesame area, such that

rp =√

a

π. (9.22)

From equation (9.17) we have

p(r) = 2π rp g(π r2p) (9.23)

=4πα α2 α µ−2 α r2 α−1

p K0(ζ)Γ (α)2

where ζ = 2√π rp α/µ (9.24)

and the mean, variance and coefficient of variation, §1.2, of pore radius are

rp =µ√π

Γ (α + 1/2)2

αΓ (α)2(9.25)

σ2(rp) =µ2

π

(

1 − Γ (α + 1/2)4

α2 Γ (α)4

)

(9.26)

cv(rp) =αΓ (α)2

Γ (α + 1/2)2

1 − Γ (α + 1/2)4

α2 Γ (α)4(9.27)

We see that here the coefficient of variation is independent of the mean, aproperty that characterises the gamma distribution, and it is easily shownthat the probability density function (9.24) for pore radii is well-approximatedby a gamma distribution of the same mean and variance [73].

Recall that the mean of the underlying gamma distributions represent-ing the polygon side lengths is µ and the coefficient of variation is 1/

√α,

Page 185: Information Geometry: Near Randomness and Near Independence

9.5 Independent Polygon Sides 175

0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

00

rp/β

p(r p

)

Clumped, CV(g) = 1.2

Random, CV(g) = 1

Disperse, CV(g) = 0.8

Fig. 9.11. Probability densities of pore radii at and near to the random case asgiven by equation (9.24).

so we observe that the expected pore radius is proportional to the expectedpolygon side length, or inter-crossing distance, and the constant of propor-tionality depends only on parameter α, hence on the coefficient of variation,§1.2, of inter-crossing distances, which can be considered a measure of thenon-uniformity of the structure—in radiographs of real networks, increasedcoefficient of variation corresponds to increased contrast in the images. Thecoefficient of variation of equivalent pore radius is plotted against the para-meter α in Figure 9.12.

Note also that for random networks, we have α = 1 and,

rrandomp =

√π

4µ (9.28)

σ2(rprandom) =

(

1π− π

16

)

µ2 (9.29)

cv(rprandom) =

√16 − π2

π≈ 0.788 (9.30)

The probability density of pore radii, as given by equation (9.24) is plottedin Figure 9.11 for the random case and for networks with higher and lowerdegrees of uniformity as quantified by their coefficients of variation of rectan-gle side lengths; these are labelled ‘clumped’ and ‘disperse’ respectively. Theinfluence of network uniformity on the mean and coefficient of variation ofpore radii is shown in Figure 9.13.

9.5.1 Multiplanar Networks

In practice, real fibre networks are 3-dimensional though the hydrodynamicsof filtering suspensions of fibres with relatively high aspect ratios at high speedyields layered structures—fibres tend to penetrate only a few fibre diameters

Page 186: Information Geometry: Near Randomness and Near Independence

176 9 Stochastic Fibre Networks

0.0 0.5 1.0 1.5 2.0

1

2

3

4

5

cv(rp)

α

Fig. 9.12. Coefficient of variation of equivalent pore radius cv(rp), given by equa-tion (9.27) as a function of gamma distribution parameter α. The random case hasα = 1.

0.5 1 1.5 2

0.1

0.2

0.3

0.4

0.5

0.5 1 1.5 2

0.5

1.0

1.5

2.00.6

000

0

CV(g)= α−1/2 CV(g)= α−1/2

CV

(rp)

r p

Fig. 9.13. Effect of network uniformity on mean (left) and coefficient of variationof pore radii (right) as given by equations (9.25) and (9.27) respectively.

up or down through the network as established by Radvan et al. [170]. How-ever, sometimes the mean coverage of fibre networks is such that a significantproportion of fibres exist in planes several fibre diameters from the plane ofsupport of the network. In such cases, the porous structure can be modelledby considering the superposition of several 2-dimensional layers.

As mentioned above, it turns out that the probability density for pore radiigiven by equation (9.24) is itself well approximated by a gamma distributionwith the same mean and variance. Moreover, the distribution of a sum of inde-pendent gamma distributed random variables is itself a gamma distribution.Consider then a layered structure of circular voids with gamma distributed

Page 187: Information Geometry: Near Randomness and Near Independence

9.5 Independent Polygon Sides 177

radii. The probability density function f and cumulative distribution functionF , §1.2, for pore radii in a single layer are given by,

f(rp) =(

α

µ

)α rα−1p

Γ (α)e−α rp/µ , (9.31)

F (rp) = 1 − Γ (α, rp α/µ)Γ (α)

, (9.32)

where Γ (a, z) is the incomplete gamma function.A second layer with an independent and identical distribution of pore radii

is placed over the first layer such that the centres of pairs of voids in the twolayers are aligned. For such a structure, we assign to each pair of pores theradius of the smaller pore such that we consider effectively the radii of themost constricted part of a path through the network. The probability densityof these radii is given by,

f(rp, 2) = 2 (1 − F (rp)) f(rp),

= 2Γ (α, rp α/µ)

Γ (α)f(rp). (9.33)

The cumulative distribution function, §1.2, for two layers is

F (rp, 2) = 1 −(

Γ (α, rp α/µ)Γ (α)

)2

. (9.34)

Applying the same notation, the addition of further layers gives iteratively

f(rp, 3) = 3 (1 − F (rp, 2)) f(rp) = 3(

Γ (α, rp α/µ)Γ (α)

)2

f(rp),

F (rp, 3) = 1 −(

Γ (α, rp α/µ)Γ (α)

)3

,

f(rp, 4) = 4 (1 − F (rp, 3)) f(rp) = 4(

Γ (α, rp α/µ)Γ (α)

)3

f(rp),

F (rp, 4) = 1 −(

Γ (α, rp α/µ)Γ (α)

)4

, etc.

So, for a structure composed of n layers we have the general expressions forthe probability density and cumulative distribution functions of pore radiirespectively:

f(rp, n) = n

(

Γ (α, rp α/µ)Γ (α)

)n−1

f(rp), (9.35)

F (rp, n) = 1 −(

Γ (α, rp α/µ)Γ (α)

)n

. (9.36)

Page 188: Information Geometry: Near Randomness and Near Independence

178 9 Stochastic Fibre Networks

To use equation (9.35) we need closed expressions giving the number of lay-ers, n, and the expected pore radius µ of a monolayer network in terms ofnetwork and fibre properties.

Consider a network with mean coverage c and fractional void volume, orporosity, ε. An elemental plane of this network can be represented as a two-dimensional structure with fractional open area ε. From the Poisson statisticsfor coverage of this two-dimensional structure as given by equation (9.4) wehave

ε = P (0) = e−cε , (9.37)

where cε is the expected coverage of a two-dimensional network with fractionalopen area ε. It follows that

cε = log (1/ε) , (9.38)

so the number of layers n is given by

n =c

cε=

c

log (1/ε). (9.39)

To determine the expected pore radius for a two-dimensional network withmean coverage cε we consider first the number of intersections between fi-bres occurring per unit area; for fibres of width ω this is given by Kallmeset al. [111] as

nint =c2επ ω2

. (9.40)

Inevitably, the number of polygons npoly per unit area is approximately thesame as the number of fibre intersections per unit area, and since cε = log (1/ε)this is given by,

npoly =(log (1/ε))2

π ω2. (9.41)

It follows that the expected area of a polygonal void is

aε =ε

npoly

=π εω2

(log (1/ε))2. (9.42)

Again, we define the pore radius as that of a circle with the same area

rp,ε =

π

=ω√ε

log (1/ε). (9.43)

Figure 9.14 shows the influence of mean coverage and porosity on themean pore radius and standard deviation of pore radii as calculated from the

Page 189: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 179

10 20 30 40 50

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

00

00

σ(r p

/ω)

r p/ω

rp/ωc

ε = 0.9

ε = 0.7

ε = 0.5

ε = 0.9

ε = 0.7

ε = 0.5

Fig. 9.14. Effect of coverage and porosity on pore radii as given by by equa-tion (9.35) with equations (9.40) and (9.43) and α = π2/(16 − π2). Left: meanpore radius; right: standard deviation of pore radii plotted against the mean.

probability density given by equation (9.35) with equations (9.40) and (9.43)and with α = π2/(16 − π2) such that cv(rp,ε) coincides with equation (9.30)for random networks. The mean pore radius decreases with mean coverageand porosity and for each case the standard deviation of pore radii is pro-portional to the mean. This latter property is important since it tells us thatthe coefficient of variation of pore radii is independent of the mean. There isonly a weak dependence of the coefficient of variation on porosity and this isconsistent with measurements reported in the literature.

9.6 Correlated Polygon Sides

There is a growing body of evidence confirming the suitability of the gammadistribution to describe the dimensions of voids in stochastic fibrous materi-als [40, 67, 181], and in general classes of stochastic porous materials [108, 109],but the influence on physical phenomena of the parameters characterising thedistribution has yet to be satisfactorily explained. In part this arises becauseour models for pore areas and hence for pore radii so far assume that adjacentpolygon sides have independent lengths; in practice they do not. The obser-vation was made by Corte and Lloyd [48] that ‘roundish’ hence near-regularpolygons are more frequent than irregular ones, and that ‘slit-shaped’ poly-gons are extremely rare [46]. See Miles [149] and Kovalenko [123] for proofsthat this regularity is in fact a limiting property for random polygons as thearea, perimeter or number of sides per polygon become large.

Graphical representations of two stochastic fibre networks are given inFigure 9.15; the histogram beneath each network shows the distribution oforientations of fibres in that network. Each network consists of 1500 fibres ofunit length with their centres randomly positioned within a square of side 10units. In the image on the left, the fibre orientation distribution is uniform;in the image on the right, the locations of the fibre centres are the same butfibre axes are preferentially oriented towards the vertical.

Page 190: Information Geometry: Near Randomness and Near Independence

180 9 Stochastic Fibre Networks

p5p/62p/3p/2p/3p/60 p5p/62p/3p/2p/3p/60

Fig. 9.15. Networks of 1500 unit length fibres with their centres randomly posi-tioned within a square of side length 10 units. On the left, isotropy: fibre axes havea uniform orientation. On the right, anisotropy: fibre axes are preferentially orientedtowards the vertical direction. The distributions of orientations are indicated in thehistograms. The locations of fibre centres in each image are the same.

In the isotropic network on the left of Figure 9.15, inspection reveals thatregions of high density have many short inter-crossing distances, and in regionsof low density there are fewer but longer inter-crossing distances. So we havepositive correlation between nearby polygonal side lengths and this tends toyield more regular polygons, simply from the random variations in the localdensity that arise from the underlying Poisson point process for fibre centres.This means that random isotropy has an inherent ‘ground state’ correlationof adjacent inter-crossing distances, which explains why pores seem mainly‘roundish’ in real networks.

In the oriented network on the right we still see the effects of the lo-cal density of crossings on the lengths of nearby sides, but we observe fewer‘roundish’ polygons and more polygons of appreciable aspect ratio. It is im-portant to note however that even in the oriented example shown here, thecorrelation between pairs of adjacent polygon sides remains positive; so theeffect of random variations in the local density overwhelms the superposedanisotropy. For a significant fraction of polygons to be ‘slit-shaped’, it would

Page 191: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 181

require pairs of adjacent polygon sides to consist of one short side and onelong side. That would mean strong negative correlation, §1.3, and this is un-likely in stochastic fibre networks since the Poisson clustering of fibre centresinherent in such structures favours positive correlation. So we have a kind ofmeta-Theorem:

A Poisson process of lines in a plane has an intrinsic ground state ofpositive correlation between adjacent polygon edges and this persistseven when the process has substantial anisotropy.

This immediately begs two questions:

• What is the value of this ground state positive correlation for isotropicrandom networks?

• Is there a level of anisotropy that could reduce the ground state positivecorrelation to zero?

In [75] we used simulations of random line networks to show that the lengthsof adjacent sides in random polygons are correlated with correlation coefficientρ ≈ 0.616.

Here we study the influence of the degree of correlation between the lengthsof adjacent polygon sides on the statistics describing the distribution of poresizes in thin fibre networks. We have developed computer routines that gen-erate pairs of random numbers distributed according to the McKay bivariategamma distribution, §4.1, which allows positive correlation 0 < ρ ≤ 1.

Pores are modelled as ellipses, each characterised by a pair (x, y) thatrepresents its minor and major axes respectively. The eccentricity of eachpore is given by,

e =

1 −(

x

y

)2

(9.44)

and as a rule of thumb, we can consider pores to be ‘roundish’ if they haveeccentricity less than about 0.6. The area and perimeter of an elliptical poreare given by

A = π x y, P = 4 y E(e2) (9.45)

where E(e2) is the complete elliptic integral.The parameter of primary interest to us here is the equivalent radius of

each pore and this is given by

req = 2A

P. (9.46)

For each generated pair of correlated (x, y) we obtain the eccentricity e,area A, perimeter P and hence the equivalent pore radius req. In the nextsection we describe a simulator and use it to estimate the statistics describingthe distribution of these parameters. First we need a source distribution forthe pairs (x, y).

Page 192: Information Geometry: Near Randomness and Near Independence

182 9 Stochastic Fibre Networks

9.6.1 McKay Bivariate Gamma Distribution

The McKay bivariate gamma distribution for correlated x < y has joint prob-ability density,

m(x, y;α1, σ12, α2) =( α1

σ12)

(α1+α2)2 xα1−1(y − x)α2−1e

−√

α1σ12

y

Γ (α1)Γ (α2), (9.47)

where 0 < x < y < ∞ and α1, σ12, α2 > 0. We may use it to model correlatedpolygon sides since the ordering of polygon sides is arbitrary.

The marginal probability densities of x and y are univariate gamma dis-tributions with,

x =√α1 σ12 (9.48)

y = (α1 + α2)√

σ12

α1. (9.49)

The correlation coefficient between x and y is given by,

ρ =√

α1

α1 + α2. (9.50)

We need to keep the total fibre length constant, so we want x + y = 2. Fromequation (9.48) we have,

y = 2 −√α1 σ12. (9.51)

Equating equations (9.51) and (9.49) yields, on manipulation,

α2 = 2(√

α1

σ12− α1

)

, (9.52)

and from equation (9.48) we have,

α1 =x2

σ12. (9.53)

On simplification, substitution of equations (9.52) and (9.53) in equa-tion (9.50) yields

ρ =√

x

2 − x=√

x

y. (9.54)

It follows directly that

x =2 ρ2

1 + ρ2and y =

21 + ρ2

(9.55)

Page 193: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 183

such thatx + y = 2.

The variances of x and y are given by,

σ2(x) =x

α1(9.56)

σ2(y) =y

α1 + α2, (9.57)

respectively, and from equation (9.50) we have

α2 = α1

(

1ρ2

− 1)

. (9.58)

Substitution of equations (9.55) and (9.58) in equations (9.56) and (9.57)respectively yields,

σ2(x) =4 ρ4

(1 + ρ2)2 α1(9.59)

σ2(y) =4 ρ2

(1 + ρ2)2 α1, (9.60)

such thatσ2(x) = ρ2 σ2(y).

By specifying the correlation coefficient ρ and the coefficient of variation, §1.2,of x :

cv(x) =σx

x=

1√α1

we fully define the marginal and joint probability densities of x and y:

α1 =1

cv(x)2

1 2 3 41 2 3 4

1

2

3

4

1 2 3 4x x x

y

1

2

3

4

y

1

2

3

4

y

ρ = 0.5 ρ = 0.7 ρ = 0.9

Fig. 9.16. Three outputs of the McKay simulator with differing values of correla-tion coefficient: ρ = 0.5, 0.7, 0.9. Each family of 5000 pairs (x, y) with x < y hascoefficient of variation cv(x) = 0.6.

Page 194: Information Geometry: Near Randomness and Near Independence

184 9 Stochastic Fibre Networks

α2 = α1

(

1ρ2

− 1)

x =2 ρ2

1 + ρ2

y =2

1 + ρ2

σ12 =x2

α1.

It is easy to show that

cv(y) = ρ cv(x) hence cv(y) < cv(x) =1√α1

.

The McKay probability density functions do not include the situation whenboth marginal distributions are exponential; however, the two special caseswhen one of the marginals is exponential are of interest. These have respec-tively, α1 = 1 and α1 + α2 = 1 and then we can express the parameters interms of the correlation coefficient ρ :

α1 = 1 ⇒ cv(x) = 1,

cv(y)2 = ρ2 =1

1 + α2, σ12 =

(

2ρ2

1 + ρ2

)2

(9.61)

α1 + α2 = 1 ⇒ cv(x)2 =1ρ2

,

cv(y)2 = 1 + ρ2 =2σ12

, α1 = ρ2. (9.62)

Furthermore, in order to maintain constant total fibre length in our net-works, we have controlled the mean values so that x + y = 2 so if x = 1 − δthen y = 1 + δ. Then from (9.54) above we have

ρ =

(

1 − δ

1 + δ

)

. (9.63)

This equation is plotted in Figure 9.17 and we see that, for small δ, ρ decreasesroughly linearly like (1 − δ).

9.6.2 McKay Information Geometry

Using the results in the previous section and the information geometry of theMcKay distribution, we can illustrate how the variability of polygon side lengthcv(x) and the correlation coefficient ρ for adjacent polygon sides influence thestructure from an information geometric viewpoint. What we would like to

Page 195: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 185

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

δ

ρ =

(

1−δ1+δ

)

Fig. 9.17. McKay correlation coefficient ρ from equation (9.63) where x = 1 − δand y = 1 + δ.

show would be the information distance of a given random line structure from areference structure; unfortunately, distances are hard to compute inRiemannianmanifolds—the McKay manifold M is a 3-dimensional curved space.

Distance between a pair of points in a Riemannian manifold is definedas the infimum of arc lengths over all curves between the points. For suf-ficiently nearby pairs of points there will be a unique minimizing geodesiccurve that realises the infimum arc length. In general, such curves are hardto find between more distant points. However, we can obtain an upper boundon distances between two points T0, T1 ∈ M by taking the sum of arc lengthsalong coordinate curves that triangulate the pair of points with respect tothe coordinate axes. We obtain the following upper bound on the informationmetric distance in M from T0 with coordinates (α1, σ12, α2) = (A,B,C) toT1 with coordinates (α1, σ12, α2) = (a, b, c)

dM (T0, T1) ≤∣

∫ a

A

C − 3x4x2

+d2 logΓ (x)

dx2

dx

+

∫ c

C

|d2 logΓ (y)dy2

| dy∣

+

∫ b

B

A + C

4 z2dz

. (9.64)

There is a further difficulty because of the presence of square roots arisingfrom the norms of tangent vectors to coordinate curves so it is difficult to

Page 196: Information Geometry: Near Randomness and Near Independence

186 9 Stochastic Fibre Networks

obtain analytically the information distance, dM . However, by removing thesquare roots the integrals yield information-energy values EM , which can beevaluated analytically. Then the square root of the net information-energydifferences along the coordinate curves gives an analytic ‘energy-distance’dEM =

√EM , which approximates dM . The net information-energy differ-

ences along the coordinate curves from T0 with coordinates (α1, σ12, α2) =(A,B,C) to T1 with coordinates (α1, σ12, α2) = (a, b, c) is

EM (T0, T1) ≤∣

∫ a

A

(

C − 3x4x2

+d2 logΓ (x)

dx2

)

dx

+∣

∫ c

C

d2 logΓ (y)dy2

dy

+

∫ b

B

A + C

4 z2dz

≤∣

∫ a

A

(

C − 3x4x2

)

dx

+∣

∫ a

A

d2 logΓ (x)dx2

dx

+∣

∫ c

C

d2 logΓ (y)dy2

dy

+A + C

4

1b− 1

B

= | C4 a

− C

4A+

3 log(Aa )

4| + |ψ(a) − ψ(A)| + |ψ(c) − ψ(C)|

+A + C

4

1b− 1

B

. (9.65)

dEM =√

EM . (9.66)

whereψ = Γ ′

Γ is the digamma function. Now we take forT0 the coordinate valuescorresponding to cv(x) = 1, so α1 = 1 giving the exponential distribution ofminor axes of equivalent elliptical voids; doing this for arbitrary ρ gives:

T0 =

(

1,4ρ4

(ρ2 + 1)2,

1ρ2

− 1

)

.

For T1 we allow cv(x) to range through structures from dispersed (cv(x) < 1)through random (cv(x) = 1) to clustered (cv(x) > 1), for a range of ρ values:

T1 =

(

1cv(x)2

,4ρ4cv(x)2

(ρ2 + 1)2,

1ρ2 − 1

cv(x)2

)

.

Making these substitutions in equation (9.65) we obtain for arbitrary

EM (cv(x), ρ)|[T0:α1=1] =

(

ρ2 + 1)2

16ρ6

1cv(x)2

− 1∣

+14

(

1 − 1cv(x)2

)(

1 − 1ρ2

)

+ 3 log(

cv(x)2)

+∣

ψ

(

1cv(x)2

1ρ2

− 1)

− ψ

(

1ρ2

− 1)∣

+∣

ψ

(

1cv(x)2

)

+ γ

(9.67)

Page 197: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 187

1

1.5

2

0.4

0.6

0.80

5

10

dEM =√

EM

ρ

cv(x)

Fig. 9.18. Approximate information distances dEM =√

EM (equation (9.67)) inthe McKay manifold, measured from distributions T0 with exponential marginaldistribution for x, so α1 = 1 and cv(x) = 1.

where γ is the Euler gamma constant, of numerical value about 0.577.Figure 9.18 shows a plot of dEM =

EM (cv(x), ρ) From equation (9.67).This is an approximation but we expect it represents the main features ofthe distance of arbitrary random line structures T1 from T0 with α1 = 1 andhence cv(x) = 1. The inherent variability of a Poisson process of lines seemsto yield a ground state of positive correlation between adjacent polygon edgesand this plot suggests that the information distance increases more rapidlyaway from the line cv(x) = 1 when ρ becomes very small or very large.

Repeating the above procedure for the case when T0 has α1 + α2 = 1, weobtain

EM (α1, α2)|[T0:α1+α2=1] = |ψ (α2) − ψ (1 − α1)|

+14

(2α1 + α2) 2

4α1+

12

(−α1 − 1)∣

. (9.68)

This is plotted in Figure 9.19.

Page 198: Information Geometry: Near Randomness and Near Independence

188 9 Stochastic Fibre Networks

Fig. 9.19. Approximate information distances dEM =√

EM (equation (9.68)) inthe McKay manifold, measured from distributions T0 with exponential marginaldistribution for y, so α1 + α2 = 1.

9.6.3 McKay Information Entropy

We recall from Figure 1.4 that among the gamma distributions, the expo-nential distribution has maximum information entropy. The entropy of theMcKay bivariate gamma densities is

S(α1, σ12, α2) = α−α1

21 σ

12 (α1+1)12 L (9.69)

where L = log(

α1

σ12Γ (α1)Γ (α2)

)

+ ψ(α1)(α1 − 1) − α1 + ψ(α2)(α2 − 1) − α2.

In fact, using

σ12 =1α1

(

2ρ2

1 + ρ2

)2

we can re-express this in terms of α1, α2, ρ in the form

S

(

α1,1α1

(

2ρ2

1 + ρ2

)2

, α2

)

= 2α1+1

(

ρ2

ρ2 + 1

)α1+1

α−α1

21 K (9.70)

Page 199: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 189

where K = log(

(ρ2 + 1)2α1

4ρ4Γ (α1)Γ (α2)

)

+ ψ(α1)(α1 − 1) − α1 + ψ(α2)(α2 − 1) − α2.

Then, at α1 = 1, we have in terms of α2 and ρ

S

(

1,(

2ρ2

1 + ρ2

)2

, α2

)

=4ρ4

(ρ2 + 1)2

(

log

(

4ρ4Γ (α2)(ρ2 + 1)2

)

+(

ψ(0) (α2) − 1)

(α2 − 1))

.

(9.71)

Figure 9.20 shows the information entropy S from equation (9.71) forMcKay probablity density functions, in terms of α2, ρ when α1 = 1 andcv(x) = 1. So here, the minor axis has an exponential distribution. Makingthe substitution

α2 = α1

(

1ρ2

− 1)

from the previous section we obtain the entropy in terms of ρ alone. It turnsout that there is a shallow maximum local entropy near ρ = 0.24 and apronounced local minimum near ρ = 0.75, Figure 9.21. In Figure 9.17 we cansee that ρ = 0.24 corresponds to δ ≈ 0.89. In Figure 9.22 we plot dS

dρ , the totalderivative of entropy with respect to ρ for the interval 0.1 < ρ < 0.76 and we

Fig. 9.20. Information entropy S from equation (9.71) for McKay probablity densityfunctions, in terms of α2, ρ when α1 = 1 and cv(x) = 1.

Page 200: Information Geometry: Near Randomness and Near Independence

190 9 Stochastic Fibre Networks

0.2 0.4 0.6 0.8

−0.5

−0.4

−0.3

−0.2

−0.1

S(ρ) for α1 = 1

ρ

Fig. 9.21. McKay information entropy S in terms of ρ when cv(x) = 1 = α1.

0.2 0.3 0.4 0.5 0.6 0.7

−2.0

−1.5

−1.0

−0.5

dSdρ

ρ

Fig. 9.22. Total derivative of entropy with respect to ρ for the interval 0.1 < ρ <0.76 showing the locations of critical points by the intersections with the abscissa.From equation (9.71) for McKay probability density functions, in terms of ρ when

α2 = α1

(

1ρ2 − 1

)

and cv(x) = 1 = α1.

see the locations of these critical points by the intersections with the absisca.Figure 4.4 shows geodesics with α1 = 1 passing through (c, α2) = (1, 1) in theMcKay submanifold M1 where

Page 201: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 191

0.2 0.4 0.6 0.8

−2.0

−1.5

−1.0

−0.5

S(ρ) for (α1 + α2) = 1

ρ

Fig. 9.23. McKay information entropy in terms of ρ when (α1 + α2) = 1.

c =√

α1

σ12=√

1σ12

=1x

and so also cv(x) = 1.Figure 9.23 shows a plot of the information entropy as a function of ρ

when (α1 + α2) = 1; there is a maximum near ρ = 0.72, cf. also Figure 9.17from which we see that this network has δ ≈ 0.32.

9.6.4 Simulation Results

Our simulator generates pairs (x, y) distributed according to the McKay bi-variate gamma distribution as given by equation (9.47) and we use these pairsto compute the structural characteristics of voids as discussed previously forthe isotropic case. The inputs to the routine are the target values of the corre-lation coefficient, ρ, and the coefficient of variation, §1.2, of x which controlsthe overall variability in the network—cv(x) = 1 corresponds to a randomnetwork.

To generate the McKay distributed pairs, first a sufficiently large part ofthe x-y plane is partitioned into square regions labelled by their lower leftcorners. We compute the total McKay probability for pairs in the (i, j)th

region of the plane as:

Pij =∫ xi+∆x

xi

∫ yj+∆y

yj

m(xi, yj) dy dx (9.72)

Page 202: Information Geometry: Near Randomness and Near Independence

192 9 Stochastic Fibre Networks

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

CV(x) = 0.5CV(x) = 0.7CV(x) = 0.9CV(x) = 1/ρ

0.0

ρ

Pores `roundish'

Mea

n ec

cent

rici

ty, e

0.4

0.6

0.8

1.0

1.2

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

CV

(x U

y)

ρ

Fig. 9.24. Influence of correlation ρ on pore eccentricity e using the McKay bivariategamma distribution for inter-crossing lengths. Inset figure shows cv(x ∪ y) plottedagainst correlation. The filled point has cv(x∪ y) = 0.99 and thus represents a stateclose to a random network.

This integral is evaluated numerically for each (i, j) and these probabilitiesmultiplied by the required total number of pairs (x, y) define the simulatednumber of points in each region. The algorithm then generates this number ofpairs of uniformly distributed random numbers in the interval (xi +∆x, yj +∆y) for each (i, j). In the simulations presented here using 0 < x < y < 4,intervals of ∆x = ∆y = 0.1 have been sufficient to yield McKay distributedpairs (x, y) with ρ within 3% of the target. Examples of the outputs of thegenerator are shown in Figure 9.16 where each plot shows 5000 points eachrepresenting an (x, y) pair.

The simulator has been used to compute the statistics of elliptical voidsfor a range of correlation coefficients and a range of cv(x). Each time thesimulator was run, 5000 pairs (x, y) and the coefficient of variation of all xand y, cv(x ∪ y) were computed from the resulting data.

The mean eccentricity is plotted against the correlation coefficient inFigure 9.24. In this, and all figures arising from our analysis of the McKaybivariate gamma distribution, we present data where e ≤ 0.95; these beingassociated with correlation coefficients greater than about 0.5. The data forma curve rather like

e2 = 1 − ρ2

This gives us another meta-Theorem:

Modelling pores in a random network of lines by equivalent ellipses,the mean eccentricity essentially measures the variation in major axisy not due to positive linear correlation with minor axis x.

Page 203: Information Geometry: Near Randomness and Near Independence

9.6 Correlated Polygon Sides 193

Fig. 9.25. Standard deviation of eccentricity plotted against mean eccentricity usingthe McKay bivariate gamma distribution for free-fibre-lengths. The filled point hascv(x ∪ y) = 0.99 and thus represents a case close to a random network.

Fig. 9.26. Influence of correlation on the mean pore radius using McKay bivariategamma distribution for inter-crossing lengths. The filled point has cv(x ∪ y) = 0.99and thus represents a case close to a random network.

Network uniformity is characterised by the grand union coefficient of variationcv(x∪y) taken over all x, y. This latter is a rather weak function of ρ and cv(x),as illustrated in the inset plot, but the ranking of cv(x∪y) is the same as thatof cv(x) so this may still be used as an approximate measure of uniformity.

Page 204: Information Geometry: Near Randomness and Near Independence

194 9 Stochastic Fibre Networks

The standard deviation of eccentricities is plotted against the mean eccen-tricity in Figure 9.25. We observe a strong dependence on the uniformity ofthe network as characterised by cv(x). Note that when the average eccentric-ity is less than 0.6, and hence pores may be considered typically ‘roundish’,the standard deviation of eccentricities is greater than in the isotropic case—there is a greater likelihood of finding slit-shaped pores when the standarddeviation of eccentricities is greater.

The mean pore radius is plotted against the correlation coefficient inFigure 9.26 and the mean pore radius increases with increasing correlationand propensity for pores to be round. The effect here is stronger than thatobserved in the isotropic case since the dependence of the mean eccentricityon correlation is greater. The filled point has cv(x∪ y) = 0.99 and thus repre-sents a case close to a random network; for this point the data yielded ρ ≈ 0.9and we recall from Figure 9.17 that then δ ≈ 0.1 so the mean minor axis isabout 10% shorter than the mean major axis of the equivalent elliptical pore.

Page 205: Information Geometry: Near Randomness and Near Independence

10

Stochastic Porous Media and HydrologyWith J. Scharcanski and S. Felipussi

Stochastic porous media arise naturally in many situations; the commonfeature is a spatial statistical process of extended objects, such as voids dis-tributed in a solid or a connected matrix of distributed solids in air. We havemodelled real examples above of cosmological voids among stochastic galacticclusters and at the other extreme of scale are the inter-fibre voids in stochasticfibrous networks. The main context in the present chapter is that of voids inagricultural soils.

10.1 Hydrological Modelling

Yue et al. [216] reviewed various bivariate distributions that are constructedfrom gamma marginals and concluded that such bigamma distribution mod-els will be useful in hydrology. Here we study the application of the McKaybivariate gamma distribution, which has positive covariance, to model thejoint probability distribution of adjacent void and capillary sizes in soils. Inthis context we compare the discriminating power of an information theoreticmetric with two classical distance functions in the space of probability dis-tributions. We believe that similar methods may be applicable elsewhere inhydrology, to characterize stochastic structures of porous media and to modelcorrelated flow variables. Phien [166] considered the distribution of the stor-age capacity of reservoirs with gamma inflows that are either independent orfirst-order autoregressive and our methods may have relevance in modellingand quantifying correlated inflow processes. Govindaraju and Kavvas [98] usedgamma or Gaussian distributions to model rill depth and width at differentspatial locations and again an information geometric approach using a bi-variate gamma or Gaussian model may be useful in further probing the jointbehavior of these rill geometry variables.

K. Arwini, C.T.J. Dodson, Information Geometry. 195Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 206: Information Geometry: Near Randomness and Near Independence

196 10 Stochastic Porous Media and Hydrology

10.2 Univariate Gamma Distributions and Randomness

The family of gamma distributions, §1.4.1, has event space Ω = R+ and

probability density functions given by

S = f(x; γ, α)|γ, α ∈ R+

so here M ≡ R+ × R

+ and the random variable is x ∈ Ω = R+ with

f(x; γ, α) =(

α

γ

)αxα−1

Γ (α)e−xα/γ .

It is an exponential family, §3.2 with mean γ and it includes as a special case(α = 1) the exponential distribution itself, which complements the Poissonprocess on a line, §1.1.3, §1.2.2. It is pertinent to our interests that the prop-erty of having sample standard deviation proportional to the mean actuallycharacterizes gamma distributions, as shown recently [106], cf. Theorem 1.1.Of course, the exponential distribution has unit coefficient of variation, §1.2.

The univariate gamma distribution is widely used to model processes in-volving a continuous positive random variable, for example, in hydrologythe inflows of reservoirs [166] and the depth and width of rills [98]. As wehave seen above, the information geometry of gamma distributions has beenapplied recently to represent and metrize departures from Poisson random-ness of, for example, the processes that allocate gaps between occurrencesof each amino acid along a protein chain within the Saccharomyces cerevisiaegenome [34], Chapter 7, clustering of galaxies and communications [63, 61, 64],Chapter 6, and control [78]. We have made precise and proved the state-ment that around every Poisson random process on the real line there is aneighborhood of processes governed by the gamma distribution, so gammadistributions can approximate any small enough departure from Poisson ran-domness, Chapter 5 [14]. Such results are, by their topological nature, stableunder small perturbations of a process, which is important in real applica-tions. This, and their uniqueness property [106] Theorem 1.1, gives confidencein the use of gamma distributions to model near-Poisson processes. Moreover,the information-theoretic heritage of the metric for the neighborhoods lendssignificance to the result.

10.3 Mckay Bivariate Gamma 3-Manifold

It is logical next to consider bivariate processes, §1.3, which may depart fromindependence and from Poisson randomness. Natural choices arise for mar-ginal distributions, §1.3: Gaussian, §1.2.3, or log-Normal distributions, §3.6,and gamma distributions, §1.4.1. For example, recently in hydrology, bivariategamma distributions have been reviewed [216], and from [98] we may expectthat rill depth and width admit bivariate gamma or bivariate Gaussian models

Page 207: Information Geometry: Near Randomness and Near Independence

10.3 Mckay Bivariate Gamma 3-Manifold 197

with positive covariance. Here we concentrate on the case when the marginalsare gamma and the covariance is positive, §1.3, which has application to themodelling of void and capillary size in porous media like soils.

Positive covariance and gamma marginals gives rise to one of the earli-est forms of the bivariate gamma distribution, due to Mckay [146] which wediscussed in §4.1, defined by the density function

fM (x, y)=c(α1+α2)xα1−1(y − x)α2−1e−cy

Γ (α1)Γ (α2)defined on y > x > 0 , α1, c, α2 > 0

(10.1)

One way to view this is that fM (x, y) is the probability density for the tworandom variables X and Y = X + Z where X and Z both have gammadistributions. The marginal distributions of X and Y are gamma with shapeparameters α1 and α1 +α2 , respectively. The covariance Cov and correlationcoefficient ρM of X and Y are given by :

Cov(X,Y ) =α1

c2= σ12 (10.2)

ρM (X,Y ) =√

α1

α1 + α2. (10.3)

Observe that in this bivariate distribution the covariance, and hence correla-tion, tends to zero only as α1 tends to zero.

In the coordinates (α1, σ12, α2) the McKay densities (10.1) are given by:

f(x, y;α1, σ12, α2) =( α1

σ12)

(α1+α2)2 xα1−1(y − x)α2−1e

−√

α1σ12

y

Γ (α1)Γ (α2), (10.4)

defined on 0 < x < y < ∞ with parameters α1, σ12, α2 > 0. Where σ12 is thecovariance of X and Y . The correlation coefficient and marginal functions, ofX and Y re given by :

ρM (X,Y ) =√

α1

α1 + α2(10.5)

fX(x) =( α1

σ12)

α12 xα1−1e

−√

α1σ12

x

Γ (α1), x > 0 (10.6)

fY (y) =( α1

σ12)

(α1+α2)2 y(α1+α2)−1e

−√

α1σ12

y

Γ (α1 + α2), y > 0 (10.7)

We consider the Mckay bivariate gamma model as a 3-manifold, §4.1,equipped with Fisher information as Riemannian metric, §4.1 from equation(4.5):

Page 208: Information Geometry: Near Randomness and Near Independence

198 10 Stochastic Porous Media and Hydrology

[gij ] =

−3 α1+α24 α12 + ψ′(α1) α1−α2

4 α1 σ12− 1

2 α1α1−α24 α1 σ12

α1+α24 σ122

12 σ12

− 12 α1

12 σ12

ψ′(α2)

⎦ (10.8)

where ψ(αi) = Γ ′(αi)Γ (αi)

(i = 1, 2).

10.4 Distance Approximations in the McKay Manifold

Distance between a pair of points in a Riemannian manifold is defined as theinfimum of arc lengths, §2.1, over all curves between the points. For sufficientlynearby pairs of points there will be a unique minimizing geodesic curve thatrealises the infimum arc length, §2.1. In general, such curves are hard to findbetween more distant points. However, we can obtain an upper bound ondistances between two points T0, T1 ∈ M by taking the sum of arc lengthsalong coordinate curves that triangulate the pair of points with respect tothe coordinate axes. We adopted similar methods in the gamma manifold forunivariate processes [34, 76]. Here, following the methodology in §9.6.2, weuse the metric for the McKay manifold (10.8) §4.1, and obtain the followingupper bound on the information distance in M from T0 with coordinates(α1, σ12, α2) = (A,B,C) to T1 with coordinates (α1, σ12, α2) = (a, b, c)

dM (T0, T1) ≤∣

∫ a

A

C − 3x4x2

+d2 logΓ (x)

dx2

dx

+

∫ c

C

|d2 logΓ (y)dy2

| dy∣

+

∫ b

B

A + C

4 z2dz

. (10.9)

The square roots arise from the norms of tangent vectors, §2.0.5, to coor-dinate curves and it is difficult to obtain the closed form solution for informa-tion distance, dM . However, by removing the square roots the integrals yieldinformation-energy values EM , which can be evaluated analytically. Then thesquare root of the net information-energy differences along the coordinatecurves gives a closed analytic ‘energy-distance’ dEM =

√EM , which we can

compare with dM . The net information-energy differences along the coordi-nate curves from T0 with coordinates (α1, σ12, α2) = (A,B,C) to T1 withcoordinates (α1, σ12, α2) = (a, b, c) is

EM (T0, T1) ≤∣

∫ a

A

(

C − 3x4x2

+d2 logΓ (x)

dx2

)

dx

+∣

∫ c

C

d2 logΓ (y)dy2

dy

+

∫ b

B

A + C

4 z2dz

Page 209: Information Geometry: Near Randomness and Near Independence

10.4 Distance Approximations in the McKay Manifold 199

≤∣

∫ a

A

(

C − 3x4x2

)

dx

+∣

∫ a

A

d2 logΓ (x)dx2

dx

+∣

∫ c

C

d2 logΓ (y)dy2

dy

+A + C

4

1b− 1

B

=

C

4 a− C

4A+

3 log(Aa )

4

+ |ψ(a) − ψ(A)| + |ψ(c) − ψ(C)|

+A + C

4

1b− 1

B

. (10.10)

dEM =√

EM . (10.11)

where ψ = Γ ′

Γ is the digamma function.Next we compare distances between bivariate gamma distributions ob-

tained using this information metric upper bound (10.9) in the McKay man-ifold metric (4.5) with the classical Bhattacharyya distance [27] between thedistributions. Some further discussion of classical distance measures can befound in Chapter 3 of Fukunga [90]; the Bhattacharyya distance is actually aspecial case of the Chernoff distance [90].

The Bhattacharyya distance from T0 to T1 defined on 0 < x < y < ∞ isgiven by

dB(T0, T1) = − log∫ ∞

y=0

∫ y

x=0

T0 T1 dx dy

= − log

(

W√

Γ (A)Γ (C)Γ (a)Γ (c)

)

(10.12)

where

W =Γ

(

A + a

2

)

Γ

(

C + c

2

)(

A

B

)A+C

4 (a

b

)a+c4

( √a

2√b

+

√A

2√B

)−(A+C+a+c)

2

.(10.13)

The Kullback-Leibler ‘distance’ [126] or ‘relative entropy’ from T0 to T1

defined on 0 < x < y < ∞ is given by

KL(T0, T1) =∫ ∞

y=0

∫ y

x=0

T0 logT0

T1dx dy

= −A + ψ(A) (A− a) − C + ψ(C) (C − c) + log(

Γ (a)Γ (c)Γ (A)Γ (C)

)

+(a + c)

2log(

Ab

aB

)

+ (A + C)

aB

Ab. (10.14)

Page 210: Information Geometry: Near Randomness and Near Independence

200 10 Stochastic Porous Media and Hydrology

and we symmetrize this to give a true distance

dK(T0, T1) =KL(T0, T1) + KL(T1, T0)

2. (10.15)

10.5 Modelling Stochastic Porous Media

Structural characterization and classification of stochastic porous materialshas attracted the attention of researchers in different application areas, be-cause of its great economic importance. For example, problems related to masstransfer and retention of solids in multi-phase fluid flow through stochasticporous materials are ubiquitous in different areas of chemical engineering.One application of gamma distributed voids to stochastic porous media hasadmitted a direct statistical geometric representation of stochastic fibrousnetworks [72], §9.5, and their fluid transfer properties [74]. Agricultural en-gineering is one of the sectors that has received attention recently, mostlybecause of the changing practices in agriculture in developing countries, andin developed countries, with great environmental impact [206, 207].

Phenomenologically, mass transfer in porous media depends strongly onthe morphological aspects of the media—such as the shape and size of pores,and depends also on the topological attributes of these media, such as thepore network connectivity [74].

Several approaches have been presented in the literature for structuralcharacterization of porous media, involving morphological and topological as-pects. [12] proposed the analysis of porous media sections for their pore shapeand size distributions. In their work, images are obtained using a scanningelectron microscope for micro-structural characterization, and an optical mi-croscope for macro-structural characterization. However, the acquisition ofsamples for the analysis is destructive, and it is necessary to slice the porousmedia so that sections can be obtained, and then to introduce epoxy resin forcontrast. These procedures influence the structure of the media solid phase andconsequently the morphology and topology of the porous phase, which impliesthat the three-dimensional reconstruction is less reliable for soil samples. Inorder to overcome similar difficulties, a few years earlier, the non-destructivetesting in soil samples using tomographic images was proposed [28], but theirgoal was the evaluation of the swelling and shrinkage properties of loamysoil. Also, other researchers have concentrated on the ‘fingering’ phenom-enon occurring during fluid flow in soils [159]. More recently, researchers haveproposed geometrical and statistical approaches for porous media character-ization. A skeletonization method based on the Voronoi diagram [54] wasintroduced to estimate the distributions of local pore sizes in porous media,see also [116].

The statistical characterization and classification of stochastic porous me-dia, is essential for the simulation and/or prediction of the mass transferproperties of a particular stochastic medium. Much work has been done on

Page 211: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 201

the characterization of porous media but the discrimination among differentmodels from observed data still remains a challenging issue. This is particu-larly true considering the tomographic images of porous media, often used insoil analysis; for two recent studies see [3] and [116].

We apply our distance measures to experimental porous media data ob-tained from tomographic images of soil, to data from model porous mediaand to simulation data drawn by computer from bivariate correlated gammaprocesses. It turns out that tomographic images of soil structure reveal abivariate stochastic structure of sizes of voids and their interconnecting cap-illaries. The information geometry of the Riemannian 3-manifold of McKaybivariate gamma distributions, §4.1, provides a useful mechanism for discrim-inating between treatments of soil. This method is more sensitive than thatusing the classical Bhattacharyya distance between the bivariate gamma dis-tributions and in most cases better than the Kullback-Leibler measure fordistances between such distributions.

10.5.1 Adaptive Tomographic Image Segmentation

Image analysis is an important tool for the structural characterization ofporous materials, and its applications can be found in several areas, such asin oil reservoir modeling, and in estimates of soil permeability or bone density[199]. The structural properties of porous media can be represented by statis-tical and morphological aspects, such as distributions of pore sizes and poreshapes; or by topological properties, such as pore network inter-connectivity.

Here we describe a method for statistical characterization of porous mediabased on tomographic images, with some examples. Three-dimensional tomo-graphic images obtained from porous media samples are represented based onstatistical and topological features (i.e. pore and throat sizes distributions, aswell as pore connectivity and flow tortuosity). Based on selected geometricaland statistical features, soil sample classification is performed. Our experimen-tal results enable the analysis of the soil compaction resulting from differentsoil preparation techniques commonly used nowadays.

We outline next some morphological concepts and then our adaptive im-age segmentation approach will be described, as well as our proposed featureextraction scheme.

Pores, Grains and Throats

In general, porous media are constituted by two types of elements: grains andpore bodies. Let S be a cross-section of a porous medium, given by a 2D binaryrepresentation like the one shown in Figure 10.1, where pores (i.e. the voidfraction) are represented by white regions, and grains (i.e. the solid fraction)by black regions. The phase function Ω is defined as in [83]:

Ω(Ξ) =

1, when Ξ belongs to the pore phase;0, otherwise. (10.16)

Page 212: Information Geometry: Near Randomness and Near Independence

202 10 Stochastic Porous Media and Hydrology

Fig. 10.1. A 2D section representing the pores (clear regions) and grains (darkregions).

where Ξ denotes the vector specifying a position in S. In fact, Ξ denotes theset of integer pairs, (χ, υ), being multiples of the measuring unit. In our case,S is represented by a 2D image I of size Nh×Nv, and (χ, υ) labels the pixelcoordinates within this image. Each pixel represents the local density of theporous medium. Within an image, we may identify sets of connected pixels(i.e. pixels belonging to the same region). A path from pixel U to pixel V isa set of pixels forming the sequence A1, A2, ..., An, where A1 = U , An = V ,and Aw+1 is a neighbor of Aw+1, w = 2, · · ·, n − 1. A region is defined as aset of pixels in which there is at least one path between each pixel pair of theregion [192]; so it is a pixel-path connected set.

The images obtained by computerized tomography are slices (cross-sections) of the 3D objects, and this imaging technique has been appliedrecently to the analysis of porous media [199]. In a slice z, the pore bodies(i.e. pores) are disjoint regions Pi, with i = 1, ..., N and N is the number ofpores in z :

Pi(z) = I(χ, υ)|(χ, υ) connected . (10.17)

Let Pi(z) be a pore at slice z, and Pj(z + 1) a pore in an adjacent slicez+1, with j = 1, · · · ,M (where M is the number of pore bodies at slice z+1).If there is at least one pixel in common between Pi(z) and Pj(z + 1), thenthere exists a throat connecting these pores, defined as follows:

T (Pi(z), Pj(z + 1)) = ∩(Pi(z), Pj(z + 1)). (10.18)

Figure 10.2 shows a pore Pi(z) in slice z, and a pore Pj(z+1) in slice z+1(darker regions). The regions in common are depicted as clearer areas andrepresent the throats for better visualization.

Page 213: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 203

Fig. 10.2. An illustration representing the intersection between two regions of twoslices and the resultant throat.

Tortuosity in Porous Media

In hydrological surveys, tortuosity is one of the most informative parame-ters. However, it has different definitions in the literature. Prasher et al. [169]introduced the concept of tortuosity as the square of the ratio of the effec-tive average path in the porous medium (Lc) to the shortest distance mea-sured along the direction of the pore (L). Here, as the analysis is based onsequences of images of thin slices, the coefficient of tortuosity is evaluatedbased on three consecutive slices. Considering three interconnecting pores inslices n, n + 1 and n + 2, the local tortuosity coefficient is estimated as theEuclidean distance between the centroids of their throats T (Pi(n), Pj(n+ 1))and T (Pi(n + 1), Pj(n + 2)) :

Tort =√

(χ− χ)2 + (υ − υ)2, (10.19)

where, the centroids (χ, υ) and (χ, υ) correspond to the throats between poresat slices (n) − (n + 1) and (n + 1)(n + 2), respectively.

10.5.2 Mathematical Morphology Concepts

We shall use morphological operators for extracting the image componentsneeded in the pore and throat size representation. The morphological approach

Page 214: Information Geometry: Near Randomness and Near Independence

204 10 Stochastic Porous Media and Hydrology

to image processing consists in probing the image structures with a patternof known shape, link squares, disks and line segments, called a structuringelement B.

Planar Structuring Element, Erosion and Dilation

The two principal morphological operators are dilation and erosion. The ero-sion of a set Ξ by a structuring element B is denoted by ΞΥ and is definedas the locus of points χ such that B is included in Ξ when its origin is placedat χ [191]:

Ξ B = χ|Bχ ⊆ Ξ . (10.20)

The dilation is the dual operation of erosion, and can be thought of asa fill or grow function. It can be used to fill holes or narrow gulfs betweenobjects [29]. The dilation of a set Ξ by a structuring element B, denoted byΞ ⊕B, is defined as the locus of points χ, such that B hits Ξ when its origincoincides with χ [191]:

Ξ ⊕B = χ|Bχ ∩Ξ = ∅ . (10.21)

Geodesic Dilation and Reconstruction

The so called ‘geodesic methods’ are morphological transformations that canoperate only on some parts of an image [192]. To perform geodesic operations,we only need the definition of a geodesic distance. The simplest geodesicdistance is the one which is built from a set Ξ. The distance of two points pand q belonging to Ξ is the length of the shortest path (if any) included in Ξand joining p and q (see Figure 10.3) [205].

Fig. 10.3. Geodesic distance dχ(p, q) within a set Ξ.

Page 215: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 205

The geodesic transformations are employed in the reconstruction of a setΞ marked by another reference set Υ , non-empty, denominated marker. Ingeodesic dilation two images are considered: a marker image and a mask image.These images must have the same definition; domain and marker image mustbe lower or equal to that of the mask image. The geodesic dilation uses astructuring element, and the resulting image is then forced to remain restrictedto the mask image. Formally, the marker image Υ is any set included in themask image Ξ. We can compute the set of all points of Ξ that are at a finitegeodesic distance n ≥ 0 from Υ :

δ(n)Ξ (Υ ) = p ∈ Ξ|dΞ(p, Υ ) ≤ n . (10.22)

The elementary geodesic dilation of size 1 (δ(1)Ξ ) of a set Υ inside Ξ is

obtained as the intersection of the unit-size dilation of Υ (with respect to thestructuring element B) with the set Ξ [192]:

δ(1)Ξ (Υ ) = (Υ ⊕B) ∩Ξ. (10.23)

The geodesic dilations of a given size n can be obtained by iterating nelementary geodesic dilations [205, 192]:

δ(n)Ξ (Υ ) = δ

(1)Ξ (δ(i)

Ξ (Υ )), (10.24)

where, n ≥ 2, i = 1, · · · , n − 1, and δ(n)Ξ (Υ ) is the Ξreconstructed set

by the marker set Υ . It is constituted of all the connected components ofΞ that are marked by Υ . This transformation can be achieved by iterat-ing elementary geodesic dilations until idempotence (see Figure 10.4), whenδ(n)Ξ (Υ ) = δ

(n+1)Ξ (Υ ) (i.e. until no further modification occurs). This operation

is called reconstruction and is denoted by ρΞ(Υ ). Formally [192],

ρΞ(Υ ) = limn→∞

δ(n)Ξ (Υ ). (10.25)

The extension of geodesic transformation to greyscale images is based onthe fact that, at least in the discrete case, any increasing transformation de-fined for binary images can be extended to greyscale images [205, 192]. Byincreasing, we mean a transformation Ψ such that:

Υ ⊆ Ξ ⇒ Ψ(Υ ) ⊆ Ψ(Ξ), ∀Ξ, Υ ⊂ Z2. (10.26)

The transformation Ψ can be generalized by viewing a greyscale image Ias a stack of binary images obtained by successive thresholding. Let DI be thedomain of the image I, and the grey values of image I be in 0, 1, · · · , N − 1.The thresholded images Tk(I) are [205, 192]:

Tk(I) = p ∈ D|I(p) ≥ k . (10.27)

Page 216: Information Geometry: Near Randomness and Near Independence

206 10 Stochastic Porous Media and Hydrology

Fig. 10.4. Reconstruction of Ξ (shown in light grey) from markers Υ (black). Thereconstructed result is shown below.

Fig. 10.5. Threshold decomposition of a greylevel image.

The idea of threshold decomposition is illustrated in Figure 10.5. Thethreshold-decomposed images Tk(I) satisfy the inclusion relationship:

Tk(I) ⊆ Tk−1(I) , ∀k ∈ [1, N − 1] (10.28)

Consider the increasing transformation Ψ applied in each threshold-decomposed image; then their inclusion relationships are preserved. Thus thetransformation Ψ can be extended to greyscale images using the followingthreshold decomposition principle [192]:

∀p ∈ D,Ψ(I)(p) = max k ∈ [0, N − 1]|p ∈ Ψ(Tk(I)) . (10.29)

Page 217: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 207

Fig. 10.6. Grayscale reconstruction of mask I from markers J .

Since the binary geodesic is an increasing transformation, it satisfies:Υ1 ⊆ Υ2, Ξ1 ⊆ Ξ2, Υ1 ⊆ Ξ1, Υ2 ⊆ Ξ2, ⇒ρΞ1 (Υ1) ⊆ ρΞ2 (Υ2) .

(10.30)

Therefore, using the threshold decomposition principle (see Equation10.29), we can generalize the binary reconstruction to greyscale reconstruction[205, 192]. Let J and I be two greyscale images defined on the same domainD, with grey-level values from the discrete interval [0, 1, · · · , N − 1]. If foreach pixel p ∈ D, J(p) ≤ I(p), the greyscale reconstruction ρI(J) of the maskimage I from the marker image J is given by:

∀p ∈ D, ρI(J) (p) = max

k ∈ [0, N − 1] |p ∈ ρTk(I) (Tk (J))

(10.31)

This transformation is illustrated in Figure 10.6, where the greyscale re-construction extracts the peaks of the mask-image I, which is marked by themarker-image J .

Regional Maxima and Minima

Reconstruction turns out to provide a very efficient method to extract re-gional maxima and minima from greyscale images [205]. It is important notto confuse these concepts with local maxima and local minima. Being a localmaximum is not a regional property, but a property of a pixel. A pixel is alocal maximum if its value is greater than or equal to any pixel in its neigh-borhood. By definition, a regional maximum M (minimum) of a greyscaleimage I is a connected set of pixels with an associated value h (plateau atheight h), such that every neighboring pixel of M has strictly lower (higher)value than h [191].

Regional maxima and minima often mark relevant image objects, and im-age peaks or valleys can be used to create marker images for the morphologicalreconstruction. In order to obtain the set of regional maxima, an image I canbe reconstructed from I − 1, and the result subtracted from I. Therefore,the set of regional maxima of a greyscale image I, denoted by RMAX(I), isdefined by [205] (see Figure 10.7):

RMAX (I) = p ∈ D/(I − ρI (I − 1)) (p) . (10.32)

By duality, the set of regional minima can be computed replacing I by itscomplement IC :

RMIN (I) = RMAX(

IC)

. (10.33)

Page 218: Information Geometry: Near Randomness and Near Independence

208 10 Stochastic Porous Media and Hydrology

Fig. 10.7. Maximum detection by reconstruction subtracting I from ρI(I − 1). Atthe left, a 1D profile of I and marker image I − 1 are shown. The figure at theright shows the reconstruction of mask image I from marker image I − 1, and thedifference I−ρI(I−1) keeps only the “domes” (set of regional maxima) and removesthe background.

Fig. 10.8. Left: 1D profile of I and I − h. Center: the result of the h-maximatransform. Right: the result of extended maxima transform (dashed).

h-Extrema and Extended Extrema Transformations

The h-maxima transform allows the reconstruction of I from I − h, where hrepresents an arbitrary greylevel constant. In this way, the h-maxima trans-form provide a tool to filter the image using a contrast criterion.

By definition, the h-maxima transform suppresses all maxima whose heightis smaller than a given threshold level h. Therefore, the h-maxima detectspeaks using a contrast parameter, without involving any size or shape cri-terion, as it is imposed by other techniques like openings and closings (seeFigure 10.8. Formally, we have:

HMAXh (I) = ρI (I − h) . (10.34)

Analogously, the h-minima transform reconstructs I from I + h, where hrepresents an arbitrary grey-level. By definition, the h-minima transform sup-presses all minima whose depth is less than a given threshold level h [205]:

HMINh (I) = ρI (I + h) . (10.35)

The extended-maxima EMAX (extended-minima EMIN) are the re-gional maxima (minima) of the corresponding h-extrema transformation [191]:

EMAXh (I) = RMAX (HMAXh (I)) , (10.36)

and,EMINh (I) = RMIN (HMINh (I)) . (10.37)

Page 219: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 209

10.5.3 Adaptive Image Segmentation and Representation

Image segmentation is a very important step in porous media modelling, butit is one of the most difficult tasks in image analysis. Our goal is to obtain animage where the pore and solid phases of the original image are discriminated.In the literature, several authors have proposed the segmentation based on aglobal threshold as an early step in the porous media characterization [204,96, 103]. Basically, the central question in these methods is the selection ofa threshold to discriminate between pores and the solid phase based on theimage histogram. This is typically done interactively, by visual inspection ofthe results. Global thresholding generally results in serious segmentation flawsif the object of interest and background are not uniform (as in our case).

Next, we discuss a locally adaptive image segmentation procedure to dis-criminate between pores and the solid phase, using extended minima or max-ima. The choice of the parameter h is now the central issue, because theh-minima/maxima transform suppresses all minima/maxima regions whosedepth/height is less than h (i.e. considering a greylevel image relief). Com-paring the extended minima in Figure 10.9 we can verify that when h isincreased, the area of some objects increase, and some other objects disap-pear. In our case, the images are noisy and present low contrast, making thechoice of the parameter h critical. Therefore, let us consider the union of thepixel sets of the regions obtained by the extended minima/maxima when hvaries between 1 and k (see Figure 10.10). Formally, we have:

T kMIN (I) =

k∪

h=1EMINh (I) , (10.38)

and,

T kMAX =

k∪

h=1EMAXh (I) . (10.39)

If the parameter k is small, we obtain small regions centered on regionalminima/maxima of the image. If k increases, the regions grow and can bemerged. Some important issues arising in the analysis of soil tomographies

Fig. 10.9. Left: the extended minima with h = 10 was applied to segment thedark regions. Right the segmented image with h = 50; notice that some regionsdisappeared and others grew with increased h.

Page 220: Information Geometry: Near Randomness and Near Independence

210 10 Stochastic Porous Media and Hydrology

Fig. 10.10. The result of increasing h in the interval [1, · · · , 50], the regions withlow contrast are preserved.

Fig. 10.11. Plot of the correlation with varying k.

are that such images tend to be noisy and present low contrast, affectingthe spatial continuity in three-dimensions (since these structures are three-dimensional). We evaluate the spatial continuity between each pair of slicesin the image stack, using correlation analysis given an appropriate k value.Figure 10.11 illustrates the plot of correlation between adjacent slices as kvaries. If k is small, the correlation is low because the pixels of the adjacentimages do not present spatial continuity (see Figure 10.12, top), and if k islarge, the correlation is higher (see Figure 10.12, bottom).

Figure 10.13 shows the plot of void areas sum for the slice stack, givenTMINk(I) (i.e. 1−area), which provides an idea of the effect of the parameterk in terms of void segmentation.

In order to select the best value of k for void segmentation, we use acriterion expressed by the following function:

Page 221: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 211

Fig. 10.12. Top left: segmentation of first slice with k = 1. Top right: second slice ofthe stack also with k = 1. Bottom left and right: first and second slice with k = 100,respectively. Notice that the correlation is high for k large and low if k is small.

Fig. 10.13. Plot of (1-area) for the slice stack.

f(k) = a(1 − area) + b(correlation), ∀a, b ∈ [0, 1], (10.40)

which provides a relationship (1 − area) × (correlation), and is shown inFigure 10.14.

To find proper values for the parameters k, a and b given the image datawe use an optimization algorithm based on the Minimax approach [138]. Inthis case, the Minimax approach is used to estimate the values of k, a and bas those leading to the smallest risk/error in void segmentation, considering

Page 222: Information Geometry: Near Randomness and Near Independence

212 10 Stochastic Porous Media and Hydrology

Fig. 10.14. Plot of f(k) with a = b = 1.

all operators D(k) obtainable (notice that for each value of k one operator isdefined).

Let θ be the set of all slice image segmentations. We estimate the best voidsegmentation f ∈ θ based on the original noisy images using an operator D.The segmentation error of this estimation F = DΞ is:

r(D, f) = E

‖DΞ − f‖2

. (10.41)

Unfortunately, the expected error cannot be computed precisely, becausewe do not know in advance the probability distribution of segmentation imagesin θ. Therefore, the Minimax approach aims to minimize the maximum error:

r(D, θ) = supf∈θ

E

‖DΞ − f‖2

, (10.42)

and the Minimax error is the lower bound computed over all operators D :

rn(θ) = infD∈On

r(D, θ), with On the set of all operators. (10.43)

In practice, we must find a decision operator D that is simple to implement(small k), such that r(D, θ) is close to Minimax error rn(θ), and we have:

rn(θ) = infD∈On

supf∈θ

E

‖DΞ − f‖2

. (10.44)

In our case, we must approximate the error function based on problem con-straints, i.e. we wish to have slice continuity along the slice stack, and at the

Page 223: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 213

Fig. 10.15. Left: function f(k) that prioritizes feature continuity (correlation) witha = 0.9 and b = 0.1. Right: complement of the areas is prioritized using a = 0.1 andb = 0.9, in both cases rn(Θ) is small, but the resulting operator D not is adequatefor segmentation.

same time we want to segment the largest (maximum) number of voids. Theseare obviously conflicting constraints because by nature such tomographic im-ages have low contrast and are noisy and, at the same time the porous mediastructure may be nearly Poisson random. Therefore, we approximate the er-ror function by the measurable data described by equation (10.40), with theadditional constraint |a− b| ≤ δ, with δ ∈ [0, 1]. If this constraint is not con-sidered, the function f prioritizes just one of the constraints, and returns aninadequate result (see Figure 10.15).

An overview of the segmentation algorithm is outlined below:

1. Compute the image ρI

k∪

h=1EMINh (I)

performing the greyscale re-

construction with the union of the sets of the extended minima to allk ∈ [1, · · · , N ];

2. Compute for each k, the correlation between each pair of slices of the stackof images;

3. Compute T kMIN (I) for all slices of the image stack;

4. Compute the function in equation (10.40) for all k values in the interval;5. Using the Minimax principle, find a decision operator D(k) leading to the

smallest maximum error (i.e. deepest valley in the plot).

The image ρI

k∪

h=1EMINh (I)

is our void image segmentation. The void

data used in our experiments was obtained using the segmentation algorithmjust described. Section 10.5.4 provides a detailed discussion based on experi-mental data obtained with the application of the segmentation algorithm to465 images of soil samples (see some examples in Figure 10.17), and 58 imagesof glass spheres immersed in water (see some examples in Figure 10.16).

Page 224: Information Geometry: Near Randomness and Near Independence

214 10 Stochastic Porous Media and Hydrology

Fig. 10.16. Top row (left): original soil sample. Middle row: boundaries of thesegmented regions. Third row: segmented image.

10.5.4 Soil Tomographic Data

Three-dimensional tomographic images were obtained from thin slices of soiland model samples and the new algorithms described in § 10.5.3 were usedto reveal features of the pore size distribution, and pore connectivity. Thesefeatures are relevant for the quantitative analysis of samples in several appli-cations of economic importance, such as the effect of conventional and directplanting methods on soil properties, as compared to untreated soil (i.e. nat-ural forest soil). The three-dimensional profiles for these soils are shown inFigure 10.18.

The methodology is applicable generally to stochastic porous media buthere we focus on the analysis of soil samples, in terms of the soil compactionresulting from different soil preparation techniques. The interconnectivity ofthe pore network is analyzed through a fast algorithm that simulates flow.The image analysis methods employed to extract features from these imagesare beyond our present scope and will be discussed elsewhere.

The two variables 0 < x < y < ∞ correspond as follows: y representsthe cross-section area of a pore in the soil and x represents the correspondingcross-sectional area of the throats or capillaries that connect it to neighbouringvoids. It turns out that these two variables have a positive covariance andcan be fitted quite well to the McKay bivariate gamma distribution (10.4),

Page 225: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 215

Fig. 10.17. Top row (left): original soil sample. Middle row: boundaries of thesegmented regions. Third row: segmented image.

Fig. 10.18. Profile three-dimensional reconstructions of soil samples. Left: Forest;center: Conventional Planting; right: Direct Planting.

Page 226: Information Geometry: Near Randomness and Near Independence

216 10 Stochastic Porous Media and Hydrology

Table 10.1. Maximum likelihood parameters of McKay bivariate gamma fittedto hydrological survey data extracted from tomograghic images of five soil sam-ples, each with three treatments. For each soil, the natural state is untreated for-est; two treatments are compared: conventional and direct. The distance functionsdM , dEM , dB , dK are used to measure effects of treatments—values given are dis-tances from the forest untreated case, except that values in brackets give the dis-tances between the conventional and direct treatments. In most cases, dM is mostdiscriminating and dEM is second best. Except for Sample A, all distances agree onthe ranking of effects of treatments compared with untreated soils.

Sample α1 α2 σ12 ρM ρData dM dEM dB dK

A forest 7.6249 3.581 19199.4 0.8249 0.8555 (4.1904) (1.4368) (0.1292) (0.5775)

A conv 4.7931 7.4816 22631.2 0.6249 0.7725 2.8424 1.2643 0.7920 3.2275

A direct 1.8692 3.4911 33442.6 0.5905 0.5791 3.181 1.5336 0.5855 2.7886

B forest 1.2396 2.2965 41402.8 0.5920 0.5245 (4.1611) (1.7835) (0.3948) (1.9502)

B conv 5.6754 4.8053 34612.2 0.7359 0.5500 3.7034 1.8784 0.3214 1.6816

B direct 1.3622 2.0074 30283.5 0.6358 0.5215 0.6322 0.5820 0.0346 0.1390

C forest 1.6920 2.6801 37538.3 0.6221 0.5582 (2.7858) (1.8896) (0.2140) (1.0462)

C conv 0.7736 1.2488 30697.5 0.6185 0.5466 2.3931 1.5372 0.1727 0.7403

C direct 2.8476 2.6413 25840.8 0.7203 0.7975 1.1146 0.9518 0.0910 0.3741

D forest 1.8439 1.7499 15818.7 0.7163 0.6237 (6.1671) (2.2155) (1.6124) (8.6557)

D conv 0.963 1.2533 26929.7 0.6592 0.5324 1.7529 1.3028 0.0526 0.2221

D direct 3.4262 9.9762 39626.4 0.5056 0.3777 5.5669 1.7660 2.3136 10.3993

E forest 2.7587 1.4647 26501.5 0.8082 0.7952 (1.4423) (1.1962) (0.0830) (0.3519)

E conv 3.0761 2.4388 38516.1 0.7468 0.6799 1.3283 0.9314 0.1388 0.5609

E direct 1.4987 2.107 47630.9 0.6447 0.6400 1.8977 1.2728 0.2401 0.9932

§4.1. The maximum likelihood parameters (α1, σ12, α2) for the data are shownin Table 10.1, together with the McKay correlation coefficient, ρM and themeasured correlation ρData.

In these experiments, we used tomographic images of soil samples, andpackings of spheres of different sizes as phantoms. The soil samples were se-lected from untreated (i.e. forest soil type), and treated (i.e. conventionalmechanized cultivated soil, and direct plantation cultivated soil). The imageanalysis methods employed to measure the pore and throat size distributionsin these images are out of the scope of this paper, and will be discussed else-where. Typical scatterplots of the throat area and pore area data are shown inFigure 10.19 for the untreated soil forest A, which shows strong correlation,and in Figure 10.20 for the model structure 1 made from beds of spheres,which shows weak correlation.

We see from Table 10.1 that the theoretical McKay correlation agrees wellwith that found experimentally. The four distance functions dM , dEM , dB , dK

are given for the four soil treatments. The information metric dM is the mostdiscriminating and the Bhattacharyya distance dB is the least discriminating.Over all treatments the grand means for distances from untreated (forest) soils

Page 227: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 217

0 100 200 300 400 500 600 7000

100

200

300

400

500

600Soil A − Forest

Pore Areas

Thr

oat A

reas

Fig. 10.19. Scatterplot of the data for untreated forest A soil sample, Table 10.1.

0 20 40 60 80 100 120 1400

10

20

30

40

50

60Phantom 1

Pore Areas

Thr

oat A

reas

Fig. 10.20. Scatterplot of the data for model sphere bed structure Sample 1 ofTable 10.2.

Page 228: Information Geometry: Near Randomness and Near Independence

218 10 Stochastic Porous Media and Hydrology

Table 10.2. Maximum likelihood parameters of McKay bivariate gamma fitted todata extracted from tomographic images of model beds of spheres and a simulation.There is no reference structure from which to base distances so here the distancesshown are measured from the mean values (α1, σ12, α2) = (1.6151, 19.9672, 1.1661)of the three parameters taken over the spheres. We note that the distances obtainedfor the sphere models are in every case ordered: dB < dEM < dK < dM and theyagree on the ranking of effects of conditions. The simulation data, having α1 << 1,seems very different from the model sphere beds.

Sample α1 α2 σ12 ρM ρData dM dEM dB dK

1(2.4 a 3.3 mm) 1.0249 0.1469 54.8050 0.9341 0.3033 3.6369 1.9071 0.4477 2.9319

2(1.4 a 2.0 mm) 1.6789 0.5117 4.3863 0.8755 0.1714 2.3057 1.5185 0.4195 1.7888

3(1.0 a 1.7 mm) 2.1416 2.8396 0.7103 0.6557 0.1275 4.5013 2.1216 0.6751 4.1202

Simulation 0.1514 0.3185 4137 0.5676 0.1118 8.5318 2.9209 0.8539 11.4460

are respectively 2.423, 1.302, 0.474, 2.113, for dM , dEM , dB , dK . The distancemeasures are found also for the model structures of spheres and the simula-tion, Table 10.2, but here the experimental correlation is much less than thatexpected for the McKay distribution.

The soil results from Table 10.1 are shown in Figure 10.21. The first twoplots use the information distance dM and energy-distance dEM bounds (10.9,10.11 respectively) for the McKay manifold metric (10.8), the other two plotsuse the Bhattacharyya distance dB (10.12) and the Kullback-Leibler distancedK (10.12) between the corresponding bivariate gamma distributions. Thebase plane d = 0 represents the natural or forest soil; data is in pairs, twopoints of the same size correspond to the same soils with two treatments.Importantly, the information metric, dM , is mainly the most discriminatingbetween the treated and untreated soils—the points being generally highestin the graphic for dM , though Soil D direct treatment has a particularly highdK value. Except for Sample A, all distances agree on the ranking of effectsof treatments. The sphere packing model results and the simulation resultsfrom Table 10.2 are shown in Figure 10.22.

Note that the McKay bivariate gamma distribution does not contain thecase of both marginal distributions being exponential nor the case of zerocovariance—both of these are limiting cases and so cannot serve as base pointsfor distances. Thus, there is no natural reference structure from which to basedistances in these model results so here the distances shown are measuredfrom the mean values of the three parameters taken over the spheres. We notethat the distances obtained are in each case ordered: dB < dEM < dK < dM ;they all agree on ranking of the effects of conditions.

Computer Simulated Data

Four sets of 5000 pairs (xi, yi) with 0 < xi < yi were drawn by computerfrom gamma distributions, with different parameters and with varying positive

Page 229: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 219

2 4 2 4 6 8 100

2.5

5

7.5

10

2 4 2 4 2 4 6 8 100

2.5

5

7.5

10

2 4α1 α2

dEM

α1α2

dM

2 4 2 4 6 8 100

2.5

5

7.5

10

2 4 2 4 2 4 6 8 100

2.5

5

7.5

10

2 4α

dB dK

1 α2 α1α2

Fig. 10.21. Distances of 5 pairs of treated soil samples from the untreated forestsoil, in terms of their porous structure from tomographic images, using the datafrom Table 10.1. Clockwise from the top left the plots use: the information theoreticbounds (10.9) dM and root energy dEM for the McKay manifold metric (4.5), theBhattacharyya distance (10.12) dB and the Kullback-Leibler distance (10.12) dK be-tween the corresponding bivariate gamma distributions. The plane d = 0 representsthe natural or forest soil; data is in pairs, two points of the same size correspondto the same soils with two treatments. The information metric, top left, is mostdiscriminating.

covariance between the two variables, Figure 10.23. This data was analyzedand from it maximum likelihood fits were made of McKay bivariate gammadistributions.

Table 10.3 summarizes the parameters, and the distances measured be-tween the corresponding points are shown in Tables 10.4, 10.5, 10.6 and 10.7.These experiments confirm that data sets 1 and 2 are more similar to eachother, than to the data sets 3 and 4, which is verified by visual inspection ofthe scatterplots shown in Figure 10.23.

If we consider that data sets 1 and 2 form one cluster, and that data sets 3and 4 form another cluster, it is important to verify how the distance measureswe are comparing perform in terms of data discrimination. Table 10.8 showsthe ratios between the mean inter and intra cluster distances, indicating that

Page 230: Information Geometry: Near Randomness and Near Independence

220 10 Stochastic Porous Media and Hydrology

0 0.5 1 1.5 2 01

20

2.5

5

7.5

10

0 0.5 1 1.5 2α2

α1

1

2

3

Spheres

Simulation

Distance

Fig. 10.22. Distances of model sphere samples and a simulation, measured fromthe average parameter values for the three sphere samples, using the data fromTable 10.2. For the model sphere structures, the distances are in each case ordered:dB < dEM < dK < dM , with the exception of the simulation. The increasing pointsizes refer to: the model sphere beds 1, 2, 3, respectively.

0 2 4 6 8x

0

2

4

6

8

y

0 2 4 6 8 10x

0

2

4

6

8

10

y

0 5 10 15 20 25x

0

5

10

15

20

25

y

0 5 10 15 20 25x

0

5

10

15

20

25

y

Fig. 10.23. Scatterplot of the data for computer simulations of the four positivelycorrelated gamma processes with x < y : top row # 1,2, second row # 3,4. Maximumlikelihood McKay parameters are given in Table 10.3.

Page 231: Information Geometry: Near Randomness and Near Independence

10.5 Modelling Stochastic Porous Media 221

Table 10.3. Maximum likelihood McKay parameters for the four simulated bivari-ate gamma data sets 1,2,3,4.

# x y α1 α2 σ12 ρM ρdata

1 0.9931 1.5288 1.0383 0.9174 0.9607 0.7286 0.9017

2 1.1924 1.7275 1.0176 0.7934 1.4117 0.7443 0.9304

3 2.9793 3.5151 1.0383 0.3598 8.5813 0.8618 0.9873

4 3.2072 3.7429 1.0165 0.3432 10.1006 0.8646 0.9892

Table 10.4. Pairwise McKay information energy distances dEsymM for the four sim-

ulated bivariate gamma data sets 1,2,3,4.

dEsymM sample set 2 sample set 3 sample set 4

sample set 1 1.2505 2.5489 2.2506

sample set 2 0.0000 2.1854 2.2988

sample set 3 0.0000 0.0000 0.2067

Table 10.5. Pairwise McKay information distances dsymM for the four simulated

bivariate gamma data sets 1,2,3,4.

dsymM sample set 2 sample set 3 sample set 4

sample set 1 1.2625 2.4836 2.6511

sample set 2 0.0000 2.0532 2.1788

sample set 3 0.0000 0.0000 0.1653

Table 10.6. Pairwise Kullback-Leibler distances dK for the four simulated bivariategamma data sets 1,2,3,4.

dK sample set 2 sample set 3 sample set 4

sample set 1 0.6804 1.0368 1.1793

sample set 2 0.0000 0.7013 0.1828

sample set 3 0.0000 0.0000 0.0040

Table 10.7. Pairwise Bhattacharyya distances dB for the four simulated bivariategamma data sets 1,2,3,4.

dB sample set 2 sample set 3 sample set 4

sample set 1 0.1661 0.2254 0.2506

sample set 2 0.0000 0.1596 0.1815

sample set 3 0.0000 0.0000 0.0010

Table 10.8. Expected ratio between inter/intra cluster distances for the four sim-ulated bivariate gamma data sets 1,2,3,4.

dEsymM dsym

M dK dB

1.6734 1.6401 1.2225 1.1323

Page 232: Information Geometry: Near Randomness and Near Independence

222 10 Stochastic Porous Media and Hydrology

the best data separability is obtained by dEsymM and dsym

M , using the sym-metrization

dsym =12(d(A,B) + d(B,A)).

In most cases, the information geometry, which uses a maximum likelihoodmetric, is more discriminating than the classical Bhattacharyya distance, orthe Kullback-Leibler divergence, between pairs of bivariate gamma distribu-tions. We have also available the information geometry of bivariate Gaussianand bivariate exponential distributions and we expect that our methodologymay have other applications in the modelling of bivariate statistical processesin hydrology.

Page 233: Information Geometry: Near Randomness and Near Independence

11

Quantum Chaology

This chapter, based on Dodson [66], is somewhat speculative in that it is clearthat gamma distributions do not precisely model the analytic systems dis-cussed here, but some features may be useful in studies of qualitative genericproperties in applications to data from real systems which manifestly seemto exhibit behaviour reminiscent of near-random processes. Quantum counter-parts of certain simple classical systems can exhibit chaotic behaviour throughthe statistics of their energy levels and the irregular spectra of chaotic sys-tems are modelled by eigenvalues of infinite random matrices. We use knownbounds on the distribution function for eigenvalue spacings for the Gaussianorthogonal ensemble (GOE) of infinite random real symmetric matrices andshow that gamma distributions, which have the important uniqueness prop-erty Theorem 1.1, can yield an approximation to the GOE distribution. Thishas the advantage that then both chaotic and non chaotic cases fit in theinformation geometric framework of the manifold of gamma distributions.Additionally, gamma distributions give approximations, to eigenvalue spac-ings for the Gaussian unitary ensemble (GUE) of infinite random hermitianmatrices and for the Gaussian symplectic ensemble (GSE) of infinite randomhermitian matrices with real quaternionic elements. Interestingly, the spacingdistribution between zeros of the Riemann zeta function is approximated bythe GUE distribution, and we investigate the stationarity of the coefficientof variation of the numerical data with respect to location and sample size.The review by Deift [52] illustrates how random matrix theory has significantlinks to a wide range of mathematical problems in the theory of functions aswell as to mathematical physics.

11.1 Introduction

Berry introduced the term quantum chaology in his 1987 Bakerian Lecture [24]as the study of semiclassical but non-classical behaviour of systems whose clas-sical motion exhibits chaos. He illustrated it with the statistics of energy levels,

K. Arwini, C.T.J. Dodson, Information Geometry. 223Lecture Notes in Mathematics 1953,c© Springer-Verlag Berlin Heidelberg 2008

Page 234: Information Geometry: Near Randomness and Near Independence

224 11 Quantum Chaology

following his earlier work with Tabor [25] and with related developments fromthe study of a range of systems. In the regular spectrum of a bound systemwith n ≥ 2 degrees of freedom and n constants of motion, the energy levels arelabelled by n quantum numbers, but the quantum numbers of nearby energylevels may be very different. In the case of an irregular spectrum, such as foran ergodic system where only energy is conserved, we cannot use quantumnumber labelling. This prompted the use of energy level spacing distributionsto allow comparisons among different spectra [25]. It was known, eg from thework of Porter [168], that the spacings between energy levels of complex nu-clei and atoms are modelled by the spacings of eigenvalues of random matricesand that the Wigner distribution [214] gives a very good fit. It turns out thatthe spacing distributions for generic regular systems are negative exponential,that is Poisson random, §1.1.3; but for irregular systems the distributions areskew and unimodal, at the scale of the mean spacing. Mehta [145] provides adetailed discussion of the numerical experiments and functional approxima-tions to the energy level spacing statistics, Alt et al [5] compare eigenvaluesfrom numerical analysis and from microwave resonator experiments, also eg.Bohigas et al [30] and Soshnikov [193] confirm certain universality properties.Miller [151] provides much detail on a range of related number theoretic prop-erties, including random matrix theory links with L-functions. Forrester’s on-line book [88] gives a wealth of analytic detail on the mathematics and physicsof eigenvalues of infinite random matrices for the three ensembles of particularinterest: Gaussian orthogonal (GOE), unitary (GUE) and symplectic (GSE),being real, complex and quaternionic, respectively.

Definition 11.1. There are three cases of interest [88]

GOE: A random real symmetric n×n matrix belongs to the Gaussian orthog-onal ensemble (GOE) if the diagonal and upper triangular elements areindependent random variables with Gaussian distributions of zero meanand standard deviation 1 for the diagonal and 1√

2for the upper triangular

elements.GUE: A random hermitian n× n matrix belongs to the Gaussian unitary en-

semble (GUE) if the diagonal elements mjj (which must be real) and theupper triangular elements mjk = ujk + ivjk are independent random vari-ables with Gaussian distributions of zero mean and standard deviation 1√

2

for the mjj and 12 for each of the ujk and vjk.

GSE: A random hermitian n × n matrix with real quaternionic elements be-longs to the Gaussian symplectic ensemble (GSE) if the diagonal elementszjj (which must be real) are independent with Gaussian distribution ofzero mean and standard deviation 1

2 and the upper triangular elementszjk = ujk + ivjk and wjk = u′

jk + iv′jk are independent random variableswith Gaussian distributions of zero mean and standard deviation 1

2√

2for

each of the ujk, u′jk, v

′jk and vjk.

Page 235: Information Geometry: Near Randomness and Near Independence

11.1 Introduction 225

Then the matrices in these ensembles are respectively invariant under theappropriate orthogonal, unitary and symplectic transformation groups, andmoreover in each case the joint density function of all independent elementsis controlled by the trace of the matrices and is of form [88]

p(X) = An e− 1

2 TrX2(11.1)

where An is a normalizing factor. Barndorff-Nielsen et al [19] give some back-ground mathematical statistics on the more general problem of quantum in-formation and quantum statistical inference, including reference to randommatrices.

Here we show that gamma distributions, §1.4.1, provide approximations toeigenvalue spacing distributions for the GOE distribution comparable to theWigner distribution at the scale of the mean and for the GUE and GSE distri-butions, except near the origin. That may be useful because the gamma distri-bution has a well-understood and tractable information geometry [14, 65] aswell as the important uniqueness property of Hwang and Hu [106] Theorem 1.1above, which says that: For independent positive random variables with a com-mon probability density function f, having independence of the sample meanand the sample coefficient of variation is equivalent to f being the gammadistribution.

It is noteworthy also that the non-chaotic case has an exponential distri-bution of spacings between energy levels and that the sum of n independentidentical exponential random variables follows a gamma distribution and thesum of n independent identical gamma random variables follows a gammadistribution, §1.4; moreover, the product of independent gamma distributionsare well-approximated by gamma distributions, cf. eg. §9.5 §9.5.1.

From a different standpoint, Berry and Robnik [26] gave a statistical modelusing a mixture of energy level spacing sequences from exponential and Wignerdistributions. Monte Carlo methods were used by Caer et al. [33] to investigatesuch a mixture. Caer et al. established also the best fit of GOE, GUE andGSE unit mean distributions, for spacing s > 0, using the generalized gammadensity which we can put in the form

g(s;β, ω) = a(β, ω) sβ e−b(β,ω)sω

for β, ω > 0 (11.2)

where a(β, ω) =ω [Γ ((2 + β)/ω)]β+1

[Γ ((1 + β)/ω)]β+2and b(β, ω) =

[

Γ ((2 + β)/ω)Γ ((1 + β)/ω)

.

Then the best fits of (11.2) had the parameter values [33]

Ensemble β ω VarianceExponential 0 1 1

GOE 1 1.886 0.2856GUE 2 1.973 0.1868GSE 4 2.007 0.1100

Page 236: Information Geometry: Near Randomness and Near Independence

226 11 Quantum Chaology

and were accurate to within ∼ 0.1% of the true distributions fromForrester [88]. Observe that the exponential distribution is recovered bythe choice g(s; 0, 1) = e−s. These distributions are shown in Figure 11.3.Gotze and Kosters [97] show that the asymptotic results for the second-ordercorrelation function of the characteristic polynomial of a random matrix fromthe Gaussian Unitary Ensemble essentially continue to hold for a generalHermitian Wigner matrix.

11.2 Eigenvalues of Random Matrices

The two classes of spectra are illustrated in two dimensions by bouncinggeodesics in plane billiard tables: eg in the de-symmetrized ‘stadium ofBunimovich’ with ergodic chaotic behaviour and irregular spectrum on theone hand, and on the other hand in the symmetric annular region betweenconcentric circles with non-chaotic behaviour, regular spectrum and randomenergy spacings [25, 30, 145, 24].

It turns out that the mean spacing between eigenvalues of infinite sym-metric real random matrices—the Gaussian Orthogonal Ensemble (GOE)—is bounded and therefore it is convenient to normalize the distribution tohave unit mean; also, in fact, the same is true for the GUE and GSE cases.Barnett [21] provides a numerical tabulation of the first 1,276,900 GOE eigen-values. In fact, Wigner [212, 213, 214] had already surmised that the cumula-tive probability distribution function, §1.2, for the spacing s > 0 at the scaleof the mean spacing should be of the form:

W (s) = 1 − e−πs24 . (11.3)

This has unit mean and variance 4−ππ ≈ 0.273 with probability density

functionw(s) =

π

2s e−

πs24 . (11.4)

Note that the particular case a(1, 2) = π2 , b(1, 2) = π

4 reduces (11.2) to(11.4) [214]. Remarkably, Wigner’s surmise gave an extremely good fit withnumerical computation of the true GOE distribution, cf. Mehta [145] Ap-pendix A.15, and with a variety of observed data from atomic and nuclearsystems [214, 25, 24, 145]. Caer et al. [33] showed that the generalized gamma(11.2) was within ∼ 0.1% of the true GOE distribution from Forrester [88],rather better even than the original Wigner [214] surmise (11.4).

From Mehta [145] p 171, we have bounds on P, the cumulative probabilitydistribution function, §1.2, for the spacings between eigenvalues of infinitesymmetric real random matrices:

L(s) = 1 − e−116 π2s2 ≤ P (s) ≤ U(s) = 1 − e−

116 π2s2

(

1 − π2s2

48

)

. (11.5)

Page 237: Information Geometry: Near Randomness and Near Independence

11.2 Eigenvalues of Random Matrices 227

Here the lower bound L has mean 2√π≈ 1.13 and variance 4(4−π)

π2 ≈ 0.348,and the upper bound U has mean 5

3√

5≈ 0.940 and variance 96−25π

9π2 ≈ 0.197.The family of probability density functions for gamma distributions, §1.4.1,

with dispersion parameter κ > 0 and mean κ/ν > 0 for positive randomvariable s is given by

f(s; ν, κ) = νκ sκ−1

Γ (κ)e−sν , for ν, κ > 0 (11.6)

with variance κν2 . Then the subset having unit mean is given by

f(s;κ, κ) = κκ sκ−1

Γ (κ)e−sκ, for κ > 0 (11.7)

with variance 1κ . These parameters ν, κ are called natural coordinates, §3.3,

because they admit presentation of the family (11.6) as an exponential fam-ily [11], §3.2, and thereby provide an associated natural affine immersion inR

3 [68], §3.4

h : R+ × R

+ → R3 :(

νκ

)

νκ

logΓ (κ) − κ log ν

⎠ . (11.8)

The generalized gamma distributions (11.2) do not constitute an exponentialfamily, except in the case ω = 1, so they do not admit an affine immersion.The gamma family affine immersion, (11.8), was used [14] to present tubularneighbourhoods of the 1-dimensional subspace consisting of exponential dis-tributions (κ = 1), so giving neighbourhoods of random processes, §5.1. Themaximum entropy case has κ = 1, the exponential distribution, which corre-sponds to an underlying Poisson random event process, §1.4.1, and so modelsspacings in the spectra for non-chaotic systems; for κ > 1 the distributionsare skew unimodular. The gamma unit mean distribution fit to the true GOEdistribution from Mehta [145] has variance ≈ 0.379 and hence κ ≈ 2.42.

In fact, κ is a geodesic coordinate, §2.1, in the Riemannian 2-manifold ofgamma distributions with Fisher information metric, §3.5; arc length, §2.0.5,along this coordinate from κ = a to κ = b is given by

∫ b

a

d2 log(Γ (κ))dκ2

− 1κdκ

. (11.9)

Plotted in Figure 11.1 are the cumulative distributions for the bounds(11.5) (dashed), the gamma distribution (thick solid) fit to the true GOEdistribution with unit mean, §1.4.1, and the Wigner surmised distribution(11.3) (thin solid).

The corresponding probability density functions are in Figure 11.2: gammadistribution fit (thick solid) to the true GOE distribution from Mehta [145]

Page 238: Information Geometry: Near Randomness and Near Independence

228 11 Quantum Chaology

0 1 2 3 40.0

0.2

0.4

0.6

0.8

1.0

Fig. 11.1. The bounds on the normalized cumulative distribution function of eigen-value spacings for the GOE of random matrices (11.5) (dashed), the Wigner surmise(11.3) (thin solid) and the unit mean gamma distribution fit to the true GOE dis-tribution from Mehta [145] Appendix A.15 (thick solid).

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

Fig. 11.2. Probabilty density function for the unit mean gamma distribution fit(thick solid) to the true GOE distribution from Mehta [145] Appendix A.15 (points),the Wigner surmised density (11.4) (thin solid) and the probability densities forthe bounds (11.10) (dashed) for the distribution of normalized spacings betweeneigenvalues for infinite symmetric real random matrices.

Page 239: Information Geometry: Near Randomness and Near Independence

11.3 Deviations 229

Appendix A.15 (points), the Wigner surmised density (11.4) (thin solid) andthe probability densities for the bounds (11.10) (dashed), respectively,

l(s) =πs

2e−

πs24 , u(s) =

π2s(64 − π2s2)384

e−116 π2s2

. (11.10)

11.3 Deviations

Berry has pointed out [23] that the behaviour of the eigenvalue spacing prob-ability density near the origin is an important feature of the ensemble statis-tics of these matrices; in particular it is linear for the GOE case and for theWigner approximation. Moreover, for the unitary ensemble (GUE) of complexhermitian matrices, near the origin, the behaviour is ∼ s2 and for the symplec-tic ensemble (GSM, representing half-integer spin particles with time-reversalsymmetric interactions) it is ∼ s4.

From (11.7) we see that at unit mean the gamma density behaves like sκ−1

near the origin, so linear behaviour would require κ = 2 which gives a varianceof 1

κ = 12 whereas the GOE fitted gamma distribution has κ ≈ 2.42 and

hence variance ≈ 0.379. This may be compared with variances for the lowerbound l, 4(4−π)

π2 ≈ 0.348, the upper bound u, 96−25π9π2 ≈ 0.197, and the Wigner

distribution w, 4−ππ ≈ 0.273. The gamma distributions fitted to the lower and

upper bounding distributions have, respectively, κL = π4−π ≈ 3.660 and κU =

5π2

96−25π ≈ 2.826. Figure 11.3 shows the probability density functions for the

0 1 2 3 40.0

0.2

0.4

0.6

0.8

1.0

1.2GSE

GUE

GOE

e−s

Fig. 11.3. Probability density functions for the unit mean gamma distributions(dashed) and generalized gamma distribution (solid) fits to the true variances for leftto right the GOE , GUE and GSE cases. The two types coincide in the exponentialcase, e−s, shown dotted.

Page 240: Information Geometry: Near Randomness and Near Independence

230 11 Quantum Chaology

Fig. 11.4. The unit mean gamma distributions corresponding to the random (non-chaotic) case, κ = ν = 1 and those with exponent κ = ν = 2.420, 4.247, 9.606 forthe best fits to the true variances of the spacing distributions for the GOE, GUEand GSE cases, as points on the affine immersion in R

3 of the 2-manifold of gammadistributions.

unit mean gamma distributions (dashed) and generalized gamma distribution(solid) fits to the true variances for left to right the GOE , GUE and GSEcases; the two types coincide in the exponential case, e−s, shown dotted. Themajor differences are in the behaviour near the origin. Figure 11.4 shows unitmean gamma distributions with κ = ν = 2.420, 4.247, 9.606 for the bestfits to the true variances of the spacing distributions for the GOE, GUE andGSE cases, as points on the affine immersion in R

3 of the 2-manifold of gammadistributions, §3.4, cf. [68]. The information metric, §3.5, provides informationdistances on the gamma manifold and so could be used for comparison ofreal data on eigenvalue spacings if fitted to gamma distributions; that mayallow identification of qualitative properties and represent trajectories duringstructural changes of systems.

The authors are indebted to Rudnick [178] for pointing out that the GUEeigenvalue spacing distribution is rather closely followed by the distribution

Page 241: Information Geometry: Near Randomness and Near Independence

11.3 Deviations 231

Fig. 11.5. Probability plot with unit mean for the spacings between the first2,001,052 zeros of the Riemann zeta function from the tabulation of Odlyzko [157](large points), that for the true GUE distribution from the tabulation of Mehta [145]Appendix A.15 (medium points) and the gamma fit to the true GUE (small points).

of zeros for the Riemann zeta function; actually, Hilbert had conjectured this,as mentioned along with a variety of other probabilistic aspects of numbertheory by Schroeder [184]. This can be seen in Figure 11.5 which shows withunit mean the probability distribution for spacings among the first 2,001,052zeros from the tabulation of Odlyzko [157] (large points), that for the trueGUE distribution from the tabulation of Mehta [145] Appendix A.15 (mediumpoints) and the gamma fit to the true GUE (small points), which has κ ≈4.247. The grand mean spacing between zeros from the data was ≈ 0.566, thecoefficient of variation ≈ 0.422 and variance ≈ 0.178.

Table 11.1 shows the effect of location on the statistical data for spac-ings in the first ten consecutive blocks of 200,000 zeros of the Riemann zetafunction normalized with unit grand mean; Table 11.2 shows the effect of sam-ple size. For gamma distributions we expect the coefficient of variation to beindependent of both sample size and location, by Theorem 1.1.

Remark 11.2. The gamma distribution provides approximations to the truedistributions for the spacings between eigenvalues of infinite random matricesfor the GOE, GUE and the GSE cases. However, it is clear that gammadistributions do not precisely model the analytic systems discussed here, anddo not give correct asymptotic behaviour at the origin, as is evident from theresults of Caer et al. [33] who obtained excellent approximations for GOE,GUE and GSE distributions using the generalized gamma distribution (11.2)§11.1. The differences may be seen in Figure 11.3 which shows the unit meandistributions for gamma (dashed) and generalized gamma [33] (solid) fits tothe true variances for the Poisson, GOE, GUE and GSE ensembles.

Page 242: Information Geometry: Near Randomness and Near Independence

232 11 Quantum Chaology

Table 11.1. Effect of location: Statistical data for spacings in the first ten con-secutive blocks of 200,000 zeros of the Riemann zeta function normalized with unitgrand mean from the tabulation of Odlyzko [157].

Block Mean V ariance CV κ

1 1.232360 0.276512 0.426697 5.492392 1.072330 0.189859 0.406338 6.056543 1.025210 0.174313 0.407240 6.029744 0.996739 0.165026 0.407563 6.020195 0.976537 0.158777 0.408042 6.006076 0.960995 0.154008 0.408367 5.996517 0.948424 0.150136 0.408544 5.991318 0.937914 0.147043 0.408845 5.982509 0.928896 0.144285 0.408926 5.9801410 0.921034 0.142097 0.409276 5.96991

Table 11.2. Effect of sample size: Statistical data for spacings in ten blocks ofincreasing size 200, 000m, m = 1, 2, . . . , 10, for the first 2,000,000 zeros of theRiemann zeta function, normalized with unit grand mean, from the tabulation ofOdlyzko [157].

m Mean V ariance CV κ

1 1.23236 0.276511 0.426696 5.492422 1.15234 0.239586 0.424765 5.542463 1.10997 0.221420 0.423934 5.564214 1.08166 0.209725 0.423384 5.578695 1.06064 0.201303 0.423018 5.588336 1.04403 0.194799 0.422748 5.595487 1.03037 0.189538 0.422527 5.601338 1.01881 0.185161 0.422357 5.605849 1.00882 0.181418 0.422207 5.6098310 1.00004 0.178180 0.422094 5.61282

Unfortunately, from our present perspective, the generalized gamma dis-tributions do not have a tractable information geometry and so some featuresof the gamma distribution approximations may be useful in studies of quali-tative generic properties in applications to data from real systems. It wouldbe interesting to investigate the extent to which data from real atomic andnuclear systems has generally the qualitative property that the sample coef-ficient of variation is independent of the mean. That, by Theorem 1.1, is aninformation-theoretic distinguishing property of the gamma distribution.

It would be interesting to know if there is a number-theoretic propertythat corresponds to the apparently similar qualitative behaviour of the spac-ings of zeros of the Riemann zeta function, Tables 11.1, 11.2. Since the non-chaotic case has an exponential distribution of spacings between energy levelsand the sum of n independent identical exponential random variables fol-lows a gamma distribution and moreover the sum of n independent identical

Page 243: Information Geometry: Near Randomness and Near Independence

11.3 Deviations 233

gamma random variables follows a gamma distribution, a further analyticdevelopment would be to calculate the eigenvalue distributions for gammaor loggamma-distributed matrix ensembles. Information geometrically, theRiemannian manifolds of gamma and loggamma families are isometric, §3.6but the loggamma random variables have bounded domain and their distribu-tions contain the uniform distribution, §3.6 and §5.2, which may be importantin modelling some real physical processes.

Page 244: Information Geometry: Near Randomness and Near Independence
Page 245: Information Geometry: Near Randomness and Near Independence

References

1. I. Akyildiz. Mobility management in current and future communication net-works. IEEE Network Mag. 12, 4 (1998) 39-49.

2. I. Akyildiz. Performance modeling of next generation wireless systems, KeynoteAddress, Conference on Simulation Methods and Applications, 1-3 November1998, Orlando, Florida.

3. R.I. Al-Raoush and C. S. Willson. Extraction of physically realistic pore net-work properties from three-dimensional synchrotron X-ray microtomographyimages of unconsolidated porous media systems. Moving through scales of flowand transport in soil. Journal of Hydrology, 300, 1-4, (2005) 44-64.

4. P.L. Alger, Editor. Life and times of Gabriel Kron, Mohawk, New York,1969. Cf. C.T.J Dodson, Diakoptics Past and Future, pp 288-9 ibid.

5. H. Alt, C. Dembrowski, H.D. Graf, R. Hofferbert, H. Rehfield, A. Richter andC. Schmit. Experimental versus numerical eigenvalues of a Bunimovich stadiumbilliard: A comparison. Phys. Rev. E 60, 3 (1999) 2851-2857.

6. S-I. Amari. Diakoptics of Information Spaces Doctoral Thesis, Universityof Tokyo, 1963.

7. S-I. Amari. Theory of Information Spaces—A Geometrical Foundation of theAnalysis of Communication Systems. Research Association of Applied Geome-try Memoirs 4 (1968) 171-216.

8. S-I. Amari. Differential Geometrical Methods in Statistics Springer Lec-ture Notes in Statistics 28, Springer-Verlag, Berlin 1985.

9. S-I. Amari, O.E. Barndorff-Nielsen, R.E. Kass, S.L. Lauritzen and C.R.Rao. Differential Geometry in Statistical Inference. Lecture NotesMonograph Series, Institute of Mathematical Statistics, Volume 10, HaywardCalifornia, 1987.

10. S-I. Amari. Dual Connections on the Hilbert Bundles of Statistical Models.In Proc. Workshop on Geometrization of Statistical Theory 28-31October 1987. Ed. C.T.J. Dodson, ULDM Publications, University ofLancaster, 1987, pp 123-151.

11. S-I. Amari and H. Nagaoka. Methods of Information Geometry, AmericanMathematical Society, Oxford University Press, 2000.

12. Flavio S. Anselmetti, Stefan Luthi and Gregor P. Eberli. Quantitative Char-acterization of Carbonate Pore Systems by Digital Image Analysis. AAPGBulletin, 82, 10, (1998) 1815-1836.

235

Page 246: Information Geometry: Near Randomness and Near Independence

236 References

13. Khadiga Arwini. Differential geometry in neighbourhoods of random-ness and independence. PhD thesis, Department of Mathematics,University of Manchester Institute of Science and Technology (2004).

14. Khadiga Arwini and C.T.J. Dodson. Information geometric neighbourhoods ofrandomness and geometry of the McKay bivariate gamma 3-manifold. Sankhya:Indian Journal of Statistics 66, 2 (2004) 211-231.

15. Khadiga Arwini and C.T.J. Dodson. Neighbourhoods of independence and as-sociated geometry in manifolds of bivariate Gaussians and Freund distribu-tions. Central European J. Mathematics 5, 1 (2007) 50-83.

16. Khadiga Arwini, L. Del Riego and C.T.J. Dodson. Universal connection andcurvature for statistical manifold geometry. Houston Journal of Mathematics33, 1 (2007) 145-161.

17. Khadiga Arwini and C.T.J. Dodson. Alpha-geometry of the Weibull manifold.Second Basic Science Conference, 4-8 November 2007, Al-Fatah University,Tripoli, Libya.

18. C. Baccigalupi, L. Amendola and F. Occhionero. Imprints of primordial voidson the cosmic microwave background Mon. Not. R. Astr. Soc. 288, 2 (1997)387-96.

19. O.E. Barndorff-Nielsen, R.D. Gill and P.E. Jupp. On quantum statistical in-ference. J. Roy. Statist. Soc. B 65 (2003) 775-816.

20. O.E. Barndorff-Nielsen and D.R. Cox. Inference and Asymptotics. Mono-graphs on Statistics and Applied Probability, 52. Chapman & Hall, London,1994

21. A.H. Barnett. http://math.dartmouth.edu/~ahb/pubs.html22. A.J. Benson, F. Hoyle, F. Torres and M.J, Vogeley. LGalaxy voids in cold dark

matter universes. Mon. Not. R. Astr. Soc. 340 (2003) 160-174.23. M.V. Berry. Private communication. 2008.24. M.V. Berry. Quantum Chaology. Proc. Roy. Soc. London A 413, (1987)

183-198.25. M.V. Berry and M. Tabor. Level clustering in the regular spectrum. Proc. Roy.

Soc. London A 356, (1977) 373-394.26. M.V. Berry and M. Robnik. Semiclassical level spacings when regular and

chaotic orbits coexist. J. Phys. A Math. General 17, (1984) 2413-2421.27. A. Bhattacharyya. On a measure of divergence between two statistical pop-

ulations defined by their distributions. Bull. Calcutta Math. Soc. 35 (1943)99-110.

28. Marcelo Biassusi. Estudo da Deformao de um Vertissolo Atravs daTomografia Computadorizada de Dupla Energia Simultnea. Phd The-sis, UFRGS - Federal University of Rio Grande do Sul, Porto Alegre, Brazil.February 1996.

29. D. Bloomberg. Basic Definitions in Mathematical Morphology.www.leptonica.com/papers, April 2003.

30. O. Bohigas, M.J. Giannoni and C. Schmit. Characterization of Chaotic Quan-tum Spectra and Universality of Level Fluctuation Laws. Phys. Rev. Lett. 52,1 (1984) 1-4.

31. K. Borovkov. Elements of Stochastic Modelling, World Scientific and Im-perial College Press, Singapore and London, 2003.

32. U. Boudriot, R. Dersch, A. Greiner, and J.H. Wendorf. Electrospinning ap-proaches toward scaffold engineering–A brief overview. Artificial Organs 10(2006) 785-792.

Page 247: Information Geometry: Near Randomness and Near Independence

References 237

33. G. Le Caer, C. Male and R. Delannay. Nearest-neighbour spacing distributionsof the β-Hermite ensemble of random matrices. Physica A (2007) 190-208. Cf.also their Erratum: Physica A 387 (2008) 1713.

34. Y. Cai, C.T.J. Dodson, O. Wolkenhauer and A.J. Doig. Gamma DistributionAnalysis of Protein Sequences shows that Amino Acids Self Cluster. J. Theo-retical Biology 218, 4 (2002) 409-418.

35. M. Calvo and J.M. Oller. An explicit solution of information geodesic equationsfor the multivariate normal model. Statistics & Decisions 9, (1990) 119-138.

36. D.Canarutto and C.T.J.Dodson. On the bundle of principal connections andthe stability of b-incompleteness of manifolds. Math. Proc. Camb. Phil. Soc. 98,(1985) 51-59.

37. B. Canvel. Timing Tags for Exponentiations for RSA MSc Thesis, De-partment of Mathematics, University of Manchester Institute of Science andTechnology, 1999.

38. B. Canvel and C.T.J. Dodson. Public Key Cryptosystem Timing Analysis.Rump Session, CRYPTO 2000, Santa Barbara, 20-24 August 2000.http://www.maths.manchester.ac.uk/~kd/PREPRINTS/rsatim.ps

39. A. Cappi, S. Maurogordato and M. Lachieze-Rey A scaling law in the distrib-ution of galaxy clusters. Astron. Astrophys. 243, 1 (1991) 28-32.

40. J. Castro and M. Ostoja-Starzewski. Particle sieving in a random fiber network.Appl. Math. Modelling 24, 8-9, (2000) 523-534.

41. S. Chari, C.S. Jutla, J.R. Rao and P. Rohatgi. Towards sound ap-proaches to counteract power-analysis attacks. In Advances in Cryptology-CRYPTO ’99, Ed. M. Wiener, Lecture Notes in Computer Science 1666,Springer, Berlin 1999 pp 398-412.

42. P. Coles. Understanding recent observations of the large-scale structure of theuniverse. Nature 346 (1990) 446.

43. L.A. Cordero, C.T.J. Dodson and M. deLeon. Differential Geometry ofFrame Bundles. Kluwer, Dordrecht, 1989.

44. L.A. Cordero, C.T.J. Dodson and P.E. Parker. Connections on principalS1-bundles over compacta. Rev. Real Acad. Galega de Ciencias XIII (1994)141-149.

45. H. Corte. Statistical geometry of random fibre networks. In Structure, SolidMechanics and Engineering Design (M. Te’eni, ed.), Proc. SouthamptonCivil Engineering Materials Conference, 1969. pp. 341-355. Wiley Interscience,London, 1971.

46. H. Corte. Statistical geometry of random fibre networks. In Structure, SolidMechanics and Engineering Design, Proc. Southampton 1969 Civil En-gineering Materials Conference, vol. 1, (ed. M. Te’eni) pp341-355. Wiley-Interscience, London, 1971.

47. H. Corte and C.T.J. Dodson. Uber die Verteilung der Massendichte in Papier.Erster Teil: Theoretische Grundlagen Das Papier, 23, 7, (1969) 381-393.

48. H. Corte and E.H. Lloyd. Fluid flow through paper and sheet structure. InConsolidation of the Paper Web Trans. IIIrd Fund. Res. Symp. 1965(F. Bolam, ed.), pp 981-1009, BPBMA, London 1966.

49. D.J. Croton et al. (The 2dFGRS Team).The 2dF Galaxy Redshift Survey:Higher order galaxy correlation functions. Preprint, arXiv:astro-ph/0401434v2 23 Aug 2004.

Page 248: Information Geometry: Near Randomness and Near Independence

238 References

50. D.J. Croton et al. (The 2dFGRS Team). The 2dF Galaxy Redshift Survey:Voids and hierarchical scaling models. Preprint, arXiv:astro-ph/0401406 v2 23Aug 2004.

51. R. Dawkins. The Selfish Gene Oxford University Press, Oxford 1976—cf.also the enlarged 1989 edition.

52. P. Deift. Some open problems in random matrix theory and the theory ofintegrable systems. Preprint, arXiv:arXiv:0712.0849v1 6 December 2007.

53. L. Del Riego and C.T.J. Dodson. Sprays, universality and stability. Math. Proc.Camb. Phil. Soc. 103(1988), 515-534.

54. J.F. Delrue, E. Perrier, Z.Y. Yu and B. Velde. New Algorithms in 3D ImageAnalysis and Their Application to the Measurement of a Spatialized Pore SizeDistribution in Soils. Phys. Chem. Earth, 24, 7, (1999) 639-644.

55. M. Deng. Differential Geometry in Statistical Inference PhD thesis,Department of Statistics, Pennsylvania State University, 1990.

56. M. Deng and C.T.J. Dodson. Paper: An Engineered Stochastic Struc-ture. Tappi Press, Atlanta (1994).

57. G. Di Crescenzo and R. Ostrovsky. On concurrent zero-knowledge with pre-processing. In Advances in Cryptology-CRYPTO ’99 Ed. M. Wiener,Lecture Notes in Computer Science 1666, Springer, Berlin 1999 pp 485-502.

58. C.T.J. Dodson. Spatial variability and the theory of sampling in random fibrousnetworks. J. Royal Statist. Soc. 33, 1, (1971) 88-94.

59. C.T.J. Dodson. Systems of connections for parametric models. In Proc.Workshop on Geometrization of Statistical Theory 28-31 October 1987.Ed. C.T.J. Dodson, ULDM Publications, University of Lancaster, 1987, pp153-170.

60. C.T.J. Dodson. Gamma manifolds and stochastic geometry. In: Proceedingsof the Workshop on Recent Topics in Differential Geometry, Santiagode Compostela 16-19 July 1997. Public. Depto. Geometrıa y Topologıa 89 (1998)85-92.

61. C.T.J. Dodson. Information geodesics for communication clustering. J. Statis-tical Computation and Simulation 65, (2000) 133-146.

62. C.T.J. Dodson. Evolution of the void probability function. Presented at Work-shop on Statistics of Cosmological Data Sets, 8-13 August 1999, IsaacNewton Institute, Cambridge.http://www.maths.manchester.ac.uk/kd/PREPRINTS/vpf.ps . Cf. also [65].

63. C.T.J. Dodson. Spatial statistics and information geometry for parametricstatistical models of galaxy clustering. Int. J. Theor. Phys., 38, 10, (1999)2585-2597.

64. C.T.J. Dodson. Geometry for stochastically inhomogeneous spacetimes. Non-linear Analysis, 47 (2001) 2951-2958.

65. C.T.J. Dodson. Quantifying galactic clustering and departures from random-ness of the inter-galactic void probablity function using information geometry.http://arxiv.org/abs/astro-ph/0608511 (2006).

66. C.T.J. Dodson. A note on quantum chaology and gamma manifold approxima-tions to eigenvalue spacings for infinite random matrices. Proceedings CHAOS2008, Charnia Crete 3-6 June 2008. http://arxiv.org/abs/math-ph/0802.2251

67. C.T.J. Dodson, A.G. Handley, Y. Oba and W.W. Sampson. The pore radiusdistribution in paper. Part I: The effect of formation and grammage. AppitaJournal 56, 4 (2003) 275-280.

Page 249: Information Geometry: Near Randomness and Near Independence

References 239

68. C.T.J. Dodson and Hiroshi Matsuzoe. An affine embedding of the gammamanifold. InterStat, January 2002, 2 (2002) 1-6.

69. C.T.J. Dodson and M. Modugno. Connections over connections and univer-sal calculus. In Proc. VI Convegno Nazionale di Relativita General aFisic Della Gravitazione Florence, 10-13 Octobar 1984, Eds. R. Fabbri andM. Modugno, pp. 89-97, Pitagora Editrice, Bologna, 1986.

70. C.T.J. Dodson and T. Poston. Tensor Geometry Graduate Texts in Math-ematics 130, Second edition, Springer-Verlag, New York, 1991.

71. C.T.J. Dodson and W.W. Sampson. The effect of paper formation and gram-mage on its pore size distribution. J. Pulp Pap. Sci. 22(5) (1996) J165-J169.

72. C.T.J. Dodson and W.W. Sampson. Modeling a class of stochastic porousmedia. App. Math. Lett. 10, 2 (1997) 87-89.

73. C.T.J. Dodson and W.W. Sampson. Spatial statistics of stochastic fibre net-works. J. Statist. Phys. 96, 1/2 (1999) 447-458.

74. C.T.J. Dodson and W.W. Sampson. Flow simulation in stochastic porous me-dia. Simulation, 74:6, (2000) 351-358.

75. C.T.J. Dodson and W.W. Sampson. Planar line processes for void and densitystatistics in thin stochastic fibre networks. J. Statist. Phys. 129 (2007) 311-322.

76. C.T.J. Dodson and J. Scharcanski. Information Geometric Similarity Measure-ment for Near-Random Stochastic Processes. IEEE Transactions on Systems,Man and Cybernetics - Part A, 33, 4, (2003) 435-440.

77. C.T.J. Dodson and S.M. Thompson. A metric space of test distributions forDPA and SZK proofs. Poster Session, Eurocrypt 2000, Bruges, 14-19 May2000. http://www.maths.manchester.ac.uk/kd/PREPRINTS/mstd.pdf.

78. C.T.J. Dodson and H. Wang. Iterative approximation of statistical distribu-tions and relation to information geometry. J. Statistical Inference for Stochas-tic Processes 147, (2001) 307-318.

79. A.G. Doroshkevich, D.L. Tucker, A. Oemler, R.P. Kirshner, H. Lin, S.A.Shectman, S.D. Landy and R. Fong. Large- and Superlarge-scale Structurein the Las Campanas Redshift Survey. Mon. Not. R. Astr. Soc. 283 4 (1996)1281-1310.

80. F. Downton. Bivariate exponential distributions in reliability theory. J. RoyalStatist. Soc. Series B 32 (1970) 408-417.

81. G. Efstathiou. Counts-in-cells comparisons of redshift surveys. Mon. Not. R.Astr. Soc. 276, 4 (1995) 1425-1434.

82. A.P. Fairall. Large-scale structure in the universe Wiley-Praxis,Chichester 1998.

83. Fernandes, C.P., Magnani, F.S. 1996. Multiscale Geometrical Reconstructionof Porous Structures. Physical Review E, 54, 1734-1741.

84. W. Feller. An Introduction to Probability Theory and its Applica-tions. Volume 1, John Wiley, Chichester 1968.

85. W. Feller. An Introduction to Probability Theory and its Applica-tions. Volume 2, John Wiley, Chichester 1971.

86. R.A. Fisher. Theory of statistical estimation. Proc. Camb. Phil. Soc. 122 (1925)700-725.

87. M. Fisz. Probability Theory and Mathematical Statistics. 3rd edition,John Wiley, Chichester 1963.

88. P. J. Forrester, Log-Gases and Random Matrices, Chapter 1 Gaussian matrixensembles. Online book manuscripthttp://www.ms.unimelb.edu.au/~matpjf/matpjf.html, 2007.

Page 250: Information Geometry: Near Randomness and Near Independence

240 References

89. R.J. Freund. A bivariate extension of the exponential distribution. Journal ofthe American Statistical, 56, (1961) 971-977.

90. K. Fukunga. Introduction to Statistical Pattern Recognition, 2nd Edition, Aca-demic Press, Boston 1991.

91. B. Ghosh. Random distances within a rectangle and between two rectangles.Calcutta Math. Soc. 43, 1 (1951) 17-24.

92. S. Ghigna, S. Borgani, M. Tucci, S.A. Bonometto, A. Klypin and J.R. Primack.Statistical tests for CHDM and Lambda CDM cosmologies. Astrophys. J. 479,2, 1 (1997) 580-91.

93. J. Gleick. CHAOS: Making a New Science. Heinemann, London 1988.94. O. Goldreich, A. Sahai and S. Vadham. Can Statistical Zero-Knowledge be

made non-interactive? Or, on the relationship of SZK and NISZK. In Ad-vances in Cryptology-CRYPTO ’99, Ed. M. Wiener, Lecture Notes inComputer Science 1666, Springer, Berlin 1999 pp 467-484.

95. A. Goffeau, B.G. Barrell, H. Bussey, R.W. Davis, B. Dujon, H. Feldmann,F. Galibert, J.D. Hoheisel, C. Jacq, M. Johnston, E.J. Louis, H.W. Mewes,Y. Murakami, P. Philippsen, H. Tettelin and S.G. Oliver. Life with 6000 genes.Science 274, 546, (1996) 563-567.

96. R. Gosine,X. Zhao and S. Davis. Automated Image Analysis for Applications inReservoir Characterization. In International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies,September 2000, Brighton, UK.

97. F. Gotze and H. Kosters. On the Second-Order Correlation Function ofthe Characteristic Polynomial of a Hermitian Wigner Matrix. http://arxiv.org/abs/math-ph/0803.0926 (2008).

98. Rao S. Govindaraju and M. Levent Kavvas. Characterization of the rill geome-try over straight hillslopes through spatial scales. Journal of Hydrology, 130, 1,(1992) 339-365.

99. A. Gray Modern Differential Geometry of Curves and Surfaces 2nd

Edition, CRC Press, Boca Raton 1998.100. R.C. Griffiths. The canonical correlation coefficients of bivariate gamma dis-

tributions. Annals Math. Statist. 40, 4 (1969) 1401-1408.101. P. Grzegorzewski and R. Wieczorkowski. Entropy-based goodness-of-fit test for

exponentiality. Commun. Statist. Theory Meth. 28, 5 (1999) 1183-1202.102. F.A. Haight. Handbook of the Poisson Distribution J. Wiley, New York,

1967.103. A.W.J. Heijs, J. Lange, J.F. Schoute and J. Bouma. Computed Tomography

as a Tool for Non-destructive Analysis of Flow Patterns in Macroporous ClaySoils. Geoderma, 64, (1995) 183-196.

104. F. Hoyle and M.S. Vogeley. Voids in the 2dF Galaxy Redshift Survey. Astro-phys. J. 607 (2004) 751-764.

105. T.P. Hutchinson and C.D. Lai. Continuous Multivariate Distributions,Emphasising Applications, Rumsby Scientific Publishing, Adelaide 1990.

106. T-Y. Hwang and C-Y. Hu. On a characterization of the gamma distribution:The independence of the sample mean and the sample coefficient of variation.Annals Inst. Statist. Math. 51, 4 (1999) 749-753.

107. E.T. Jaynes. Information theory and statistical inference. The Physical Review106 (1957) 620-630 and 108 (1957) 171-190. Cf. also the collection E.T. Jaynes,Papers on probability, statistics and statistical physics Ed. R. D.Rosenkrantz, Synthese Library, 158. D. Reidel Publishing Co., Dordrecht, 1983.

Page 251: Information Geometry: Near Randomness and Near Independence

References 241

108. P.R. Johnston. The most probable pore size distribution in fluid filter media.J. Testing and Evaluation 11, 2 (1983) 117-121.

109. P.R. Johnston. Revisiting the most probable pore size distribution in filter me-dia. The gamma distribution. Filtration and Separation. 35, 3 (1998) 287-292.

110. A.M. Kagan, Y.V. Linnik and C.R. Rao. Characterization Problems inMathematical Statistics John Wiley, New York, 1973.

111. O. Kallmes and H. Corte. The structure of paper, I. The statistical geometryof an ideal two dimensional fiber network. Tappi J. 43, 9 (1960) 737-752. Cf.also: Errata 44, 6 (1961) 448.

112. O. Kallmes, H. Corte and G. Bernier. The structure of paper, V. The bondingstates of fibres in randomly formed papers. Tappi Journal 46, 8, (1963) 493-502.

113. R.E. Kass and P.W. Vos. Geometrical Foundations of Asymptotic In-ference. Wiley Series in Probability and Statistics: Probability and Statistics.A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1997.

114. G. Kauffmann and A.P. Fairall. Voids in the distribution of galaxies: an as-sessment of their significance and derivation of a void spectrum. Mon. Not. R.Astr. Soc. 248 (1990) 313-324.

115. M.D. Kaytor and S.T. Warren. Aberrant protein deposition and neurologicaldisease. J. Biological Chemistry 53, (1999) 37507-37510.

116. R.A. Ketcham and Gerardo J. Iturrino. Nondestructive high-resolution visual-ization and measurement of anisotropic effective porosity in complex lithologiesusing high-resolution X-ray computed tomography. Journal of Hydrology, 302,1-4, (2005) 92-106.

117. M. Kendall and A. Stuart. The Advanced Theory of Statistics, Volume2 Inference and Relationship 4th Edition. Charles Griffin, London, 1979.

118. W.F. Kibble. A two variate gamma-type distribution. Sankhya 5 (1941)137-150.

119. P. Kocher, J. Jaffe and B. Jun. Differential Power Analysis. In Advancesin Cryptology-CRYPTO ’99, Ed. M. Wiener, Lecture Notes in ComputerScience 1666, Springer, Berlin 1999 pp 388-397.

120. S. Kokoska and C. Nevison. Statistical Tables and Formulae Springer Textsin Statistics, Springer-Verlag, New York 1989.

121. K. Kondo, Editor. Research Association of Applied Geometry MemoirsVolume IV, Tokyo 1968.

122. S. Kotz, N.Balakrishnan and N.Johnson. Continuous Multivariate Distri-butions 2nd Edition, Volume 1 (2000).

123. I. Kovalenko. A simplified proof of a conjecture of D.G. Kendall concern-ing shapes of random polygons. J. Appl. Math. Stochastic Anal. 12, 4 (1999)301-310.

124. G. Kron. Diakoptics—The Science of Tearing, Tensors and Topological Models.RAAG Memoirs Volume II, (1958) 343-368.

125. G. Kron. Diakoptics—The Piecewise Solution of Large-Scale Syatems.MacDonald, London 1963.

126. S. Kullback. Information and Statistics, J. Wiley, New York, 1959.127. T. Kurose. On the divergences of 1-conformally flat statistical manifolds.

Tohoku Math. J., 46 (1994) 427-433.128. F. Sylos Labini, A. Gabrielli, M. Montuori and L. Pietronero. Finite size effects

on the galaxy number counts: Evidence for fractal behavior up to the deepestscale. Physica A. 226, 3-41l (1996) 195-242.

Page 252: Information Geometry: Near Randomness and Near Independence

242 References

129. F. Sylos Labini, M. Montuori and L. Pietronero. Scale Invariance of galaxyclustering. Physics Reports 293 (1998)61-226.

130. M. Lachieze-Rey, L.N. Da-Costa and S. Maurogordato. Void probability func-tion in the Southern Sky Redshift Survey. Astrophys. J. 399 (1992) 10-15.

131. C.D. Lai. Constructions of bivariate distributions by a generalized trivariatereduction technique. Statistics and Probability Letters 25, 3 (1995) 265-270.

132. W.H. Landschulz, P.F. Johnson and S.L. McKnight. The Leucine Zipper -A hypothetical structure common to a new class of DNA-binding proteins.Science 240, (1988) 1759-1764.

133. S.D. Landy, S.A. Shectman, H. Lin, R.P. Kirshner, A.A. Oemler and D. Tucker.Two-dimensionalpowerspectrumoftheLasCampanasredshift survey:Detectionof excess power on 100 h−1Mpc scales. Astroph. J. 456, 1, 2 (1996) L1-7.

134. S.L. Lauritzen. Statistical Manifolds. In Differential Geometry in Statisti-cal Inference, Institute of Mathematical Statistics Lecture Notes, Volume 10,Berkeley 1987, pp 163-218.

135. S. Leurgans, T.W-Y. Tsai and J. Crowley. Freund’s bivariate exponential dis-tribution and censoring, in Survival Analysis (R. A. Johnson, eds.), IMSLecture Notes, Hayward, California: Institute of Mathematical Statistics, 1982.

136. H. Lin, R.P. Kirshner, S.A. Schectman, S.D. Landy, A. Oemler, D.L. Tuckerand P.L. Schechter. The power spectrum of galaxy clustering in the Las Cam-panas Redshift Survey. Astroph. J. 471, 2, 1 (1996) 617-635.

137. A. Lupas. Coiled coils: New structures and new functions. Trends Biochem.Sci. 21, 10 (1996) 375-382.

138. S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, SanDiego, 1998.

139. R.E. Mark. Structure and structural anisotropy. Ch. 24 in Handbookof Physical and Mechanical Testing of Paper and Paperboard.(R.E. Mark, ed.). Marcel Dekker, New York, 1984.

140. L. Mangiarotti and M. Modugno. Fibred spaces, jet spaces and connectionsfor field theories. In Proc. International Meeting on Geometry and Physics,Florence, 12-15 October 1982, ed. M.Modugno, Pitagora Editrice, Bologna,1983 pp 135-165.

141. K.V. Mardia. Families of Bivariate Distributions. Griffin, London 1970.142. A.W. Marshall and I. Olkin. A generalized bivariate exponential distribution.

J. Appl. Prob. 4 (1967) 291-302.143. H. Matsuzoe. On realization of conformally-projectively flat statistical mani-

folds and the divergences. Hokkaido Math. J., 27 (1998) 409-421.144. H. Matsuzoe. Geometry of contrast functions and conformal geometry. Hi-

roshima Math. J., 29 (1999) 175-191.145. Madan Lal Mehta. Random Matrices 3rd Edition, Academic Press, London

2004.146. A.T. McKay. Sampling from batches. J. Royal Statist. Soc. 2 (1934) 207-216.147. R.E. Miles. Random polygons determined by random lines in a plane. Proc.

Nat. Acad. Sci. USA 52, (1964) 901-907,1157-1160.148. R.E. Miles. The various aggregates of random polygons determined by random

lines in a plane. Advances in Math. 10, (1973) 256-290.149. R.E. Miles. A heuristic proof of a long-standing conjecture of D.G. Kendall

concerning the shapes of certain large random polygons. Adv. in Appl. Probab.27, 2 (1995) 397-417.

Page 253: Information Geometry: Near Randomness and Near Independence

References 243

150. G.K. Miller and U.N. Bhat. Estimation for renewal processes with unobservablegamma or Erlang interarrival times. J. Statistical Planning and Inference 61, 2(1997) 355-372.

151. Steven J. Miller and Ramin Takloo-Bighash. An Invitation to ModernNumber Theory Princeton University Press, Princeton 2006. Cf. also theseminar notes:• Steven J. Miller: Random Matrix Theory, Random Graphs, and L-

Functions: How the Manhatten Project helped us understand primes. OhioState University Colloquium 2003.http://www.math.brown.edu/~sjmiller/math/talks/colloquium7.pdf .

• Steven J. Miller. Random Matrix Theory Models for zeros of L-functionsnear the central point (and applications to elliptic curves). BrownUniversity Algebra Seminar 2004.

http://www.math.brown.edu/~sjmiller/math/talks/RMTandNTportrait.pdf

152. M. Modugno. Systems of vector valued forms on a fibred manifold and appli-cations to gauge theories. In Proc. Conference Differential GeomeometricMethods in Mathematical Physics, Salamanca 1985, Lecture Notes inMathematics 1251, Springer-Verlag, Berlin 1987, pp. 238-264.

153. M.K. Murray and J.W. Rice. Differential Geometry and Statistics. Mono-graphs on Statistics and Applied Probability, 48. Chapman & Hall, London,1993.

154. K. Nomizu and T. Sasaki. Affine differential geometry: Geometry ofAffine Immersions. Cambridge University Press, Cambridge, 1994.

155. B. Norman. Overview of the physics of forming. In Fundamentals of Paper-making, Trans. IXth Fund. Res. Symp., (C.F. Baker, ed.), Vol III, pp. 73149,Mechanical Engineering Publications, London, 1989.

156. Y. Oba. Z-directional structural development and density variation inpaper. Ph.D. Thesis, Department of Paper Science, University of ManchesterInstitute of Science and Technology, 1999.

157. A. Odlyzko. Tables of zeros of the Riemann zeta function.http://www.dtc.umn.edu:80/~odlyzko/zeta_tables/index.html.

158. S.H. Ong. Computation of bivariate-gamma and inverted-beta distributionfunctions. J. Statistal Computation and Simulation 51, 2-4 (1995) 153-163.

159. R.N. Onody, A.N.D. Posadas and S. Crestana. Experimental Studies of theFingering Phenomena in Two Dimensions and Simulation Using a Modified In-vasion Percolation Model. Journal of Applied Physics, 78, 5, (1995) 2970-2976.

160. E.K. O’Shea,R. Rutkowski and P.S. Kim. Evidence that the leucine zipper isa coiled coil. Science 243, (1989) 538-542.

161. A. Papoulis. Probability, Random Variables and Stochastic Processes3rd edition, McGraw-Hill, New York 1991.

162. P.J.E. Peebles. Large Scale Structure of the Universe PrincetonUniversity Press, Princeton 1980.

163. S. Penel, R.G. Morrison, R.J. Mortishire-Smith and A.J. Doig. Periodicity ina-helix lengths and C-capping preferences. J. Mol. Biol. 293, (1999) 1211-1219.

164. R. Penrose. The Emperor’s New Mind Oxford University Press,Oxford 1989.

165. Q.P. Pham, U. Sharma and A.G. Mikos. Characterization of scaffolds andmeasurement of cellular infiltration. Biomacromolecules 7, 10 (2006) 2796-2805.

166. Huynh Ngoc Phien. Reservoir storage capacity with gamma inflows. Journalof Hydrology, 146, 1, (1993) 383-389.

Page 254: Information Geometry: Near Randomness and Near Independence

244 References

167. T. Piran, M.Lecar, D.S. Goldwirth, L. Nicolaci da Costa and G.R. Blumenthal.Limits on the primordial fluctuation spectrum: void sizes and anisotropy of thecosmic microwave background radiation. Mon. Not. R. Astr. Soc. 265, 3 (1993)681-8.

168. C.F. Porter. Statistical Theory of Spectra: Fluctuations Edition, Acad-emic Press, London 1965.

169. S.O. Prasher, J. Perret,A. Kantzas and C. Langford. Three-Dimensional Quan-tification of Macropore Networks in Undisturbed Soil Cores. Soil Sci. Soc. Am.Journal, 63 (1999) 1530-1543.

170. B. Radvan, C.T.J. Dodson and C.G. Skold. Detection and cause of the layeredstructure of paper. In Consolidation of the Paper Web Trans. IIIrd Fund.Res. Symp. 1965 (F. Bolam, ed.), pp 189-214, BPBMA, London 1966.

171. C.R. Rao. Information and accuracy attainable in the estimation of statisticalparameters. Bull. Calcutta Math. Soc. 37, (1945) 81-91.

172. S.A. Riboldi, M. Sampaolesi, P. Neuenschwander, G. Cossub and S. Mantero.Electrospun degradable polyesterurethane membranes: potential scaffolds forskeletal muscle tissue engineering. Biomaterials 26, 22 (2005) 4606-4615.

173. B.D. Ripley. Statistical Inference for Spatial Processes. CambridgeUniversity Press, Cambridge 1988.

174. R.L. Rivest, A. Shamir and L.M. Adleman. A method for obtaining digitalkey signatures and public-key cryptosystems. Communications of the ACM 21(1978) 120-126.

175. S. Roman. Coding and Information Theory. Graduate Texts in Mathe-matics, 134 Springer-Verlag, New York, 1992.

176. S. Roman. Introduction to Coding and Information Theory. Under-graduate Texts in Mathematics. Springer-Verlag, New York, 1997.

177. Colin Rose and Murray D.Smith. Mathematical Statistics with Mathe-matica Springer texts in statistics, Springer-Verlag, Berlin 2002.

178. Z. Rudnick. Private communication. 2008. Cf. also Z. Rudnick. What is Quan-tum Chaos? Notices A.M.S. 55, 1 (2008) 33-35.

179. A. Rushkin, J. Soto et al. A Statistical Test Suite for Random andPseudorandom Number Generators for Cryptographic Applications.National Institute of Standards & Technology, Gaithersburg, MD USA, 2001.

180. B.Ya. Ryabko and V.A. Monarev. Using information theory approach to ran-domness testing. Preprint: arXiv:CS.IT/0504006 v1, 3 April 2005.

181. W.W. Sampson. Comments on the pore radius distribution in near-planar sto-chastic fibre networks. J. Mater. Sci. 36, 21 (2001) 5131-5135.

182. W.W. Sampson. The structure and structural characterisation of fibre networksin papermaking processes.

183. Y. Sato, K. Sugawa and M. Kawaguchi. The geometrical structure of the para-meter space of the two-dimensional normal distribution. Division of informa-tion engineering, Hokkaido University, Sapporo, Japan (1977).

184. M.R. Schroeder. Number Theory in Science and Communication. WithApplications in Cryptography, Physics, Digital Information, Com-puting, and Self-Similarity. Springer Series in Information Science, 3rd

edition, Springer, Berlin 1999.185. K. Schulgasser. Fiber orientation in machine made paper. J. Mater. Sci. 20, 3

(1985) 859-866.186. C.E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J.

27, (1948) 379-423 and 623-656.

Page 255: Information Geometry: Near Randomness and Near Independence

References 245

187. D. Shi and C.D. Lai. Fisher information for Downton’s bivariate exponentialdistribution. J. Statistical Computation and Simulation 60, 2 (1998) 123-127.

188. S.D. Silvey. Statistical Inference Chapman and Hall, Cambridge 1975.189. L.T. Skovgaard. A Riemannian geometry of the multivariate normal model.

Scand. J. Statist. 11 (1984) 211-223.190. D. Slepian, ed. Key papers in the development of information theory,

IEEE Press, New York, 1974.191. P. Soille. Morphological Image Analysis: Principles and Applications.

Springer-Verlag, Heidelberg 1999.192. M. Sonka and H. Hlavac. Image Processing, Analysis, and Machine Vi-

sion, 2nd. Ed. PWS Publishing Co., 1999.193. A. Soshnikov. Universality at the edge of the spectrum in Wigner random

matrices. Commun. Math. Phys. 207 (1999) 697-733.194. M. Spivak. Calculus on Manifolds. W.A. Benjamin, New York 1965.195. M. Spivak. A Comprehensive Introduction to Differential Geometry,

Vols. 1-5, 2nd edn. Publish or Perish, Wilmington 1979.196. D. Stoyan, W.S. Kendall and J. Mecke. Stochastic Geometry and its Ap-

plications 2nd Edition, John Wiley, Chichester, 1995.197. I. Szapudi, A. Meiksin and R.C. Nichol. Higher order statistics from the

Edinburgh Durham Southern Galaxy Catalogue Survey. 1. Counts in cells.Astroph. J. 473, 1, 1 (1996) 15-21.

198. J.C. Tanner. The proportion of quadrilaterals formed by random lines in aplane. J. Appl. Probab. 20, 2 (1983) 400-404.

199. Taud H., Martinez-Angeles T. et al. 2005. Porosity Estimation Method by X-ray Computed Tomography. Journal of Petroleum Science and Engineering,47, 209-217.

200. M. Tribus. Thermostatics and Thermodynamics D. Van Nostrand andCo., Princeton N.J., 1961.

201. M. Tribus, R. Evans and G. Crellin. The use of entropy in hypothesis testing. InProc. Tenth National Symposium on Reliability and Quality Control7-9 January 1964.

202. R. van der Weygaert. Quasi-periodicity in deep redshift surveys. Mon. Not. R.Astr. Soc. 249 (1991) 159.

203. R. van der Weygaert and V. Icke. Fragmenting the universe II. Voronoi verticesas Abell clusters. Astron. Astrophys. 213 (1989) 1-9.

204. B. Velde, E. Moreau and F. Terribile. Pore Networks in an Italian Vertisol:Quantitative Characterization by Two Dimensional Image Analysis. Geoderma,72, (1996) 271-285.

205. L. Vincent. Morphological Grayscale Reconstruction in Image Analysis: Ap-plications and Efficient Algorithms. IEEE Transactions of Image Processing, 2(1993) 176-201.

206. H.J. Vogel and A. Kretzchmar. Topological Characterization of Pore Space inSoil-Sample Preparation and Digital Image-Processing. Geoderma, 73, (1996)23-18.

207. H.J. Vogel and K. Roth. Moving through scales of flow and transport in soil.Journal of Hydrology, 272, 1-4, (2003) 95-106.

208. M.S. Vogeley, M.J. Geller, C. Park and J.P. Huchra. Voids and constraints onnonlinear clustering of galaxies. Astron. J. 108, 3 (1994) 745-58.

209. H. Weyl. Space Time Matter Dover, New York 1950.

Page 256: Information Geometry: Near Randomness and Near Independence

246 References

210. S.D.M. White. The hierarchy of correlation functions and its relation to othermeasures of galaxy clustering. Mon. Not. R. Astr. Soc. 186, (1979) 145-154.

211. H. Whitney. Differentiable manifolds, Annals of Math. 41 (1940) 645-680.212. E.P. Wigner. Characteristic vectors of bordered matrices with infinite dimen-

sions. Annals of Mathematics 62, 3 (1955) 548-564.213. E.P. Wigner. On the distribution of the roots of certain symmetric matrices.

Annals of Mathematics 67, 2 (1958) 325-327.214. E.P. Wigner. Random matrices in physics. SIAM Review 9, 1 (1967) 1-23.215. S. Wolfram. The Mathematica Book 3rd edition, Cambridge University

Press, Cambridge, 1996.216. S. Yue, T.B.M.J. Ouarda and B. Bobee. A review of bivariate gamma dis-

tributions for hydrological application. Journal of Hydrology, 246, 1-4, (2001)1-18.

Page 257: Information Geometry: Near Randomness and Near Independence

Index

affine fundamental form, 36affine immersion, 35, 48affine shape operator, 36alpha-helix, 151amino acid, 139amino acid sequences, 140anisotropic networks, 180approximation to discrete, 139arc length, 26, 146arc length element, 26arc length function, 38, 40aspect ratio, 121asymtotic independence, 38, 75atlas, 20autoparallel, 29

basis fields, 24Bernoulli distribution, 3Bernoulli trials, 3Bessel function, 173between cell variance, 121Bhattacharyya distance, 199, 216, 219binary images, 205binary representation, 201binomial distribution, 3, 141binomial process, 141bivariate exponential mixture distribu-

tion, 74bivariate families, 55Bivariate Gaussian, 88, 114bivariate Gaussian, 10, 55, 88Bivariate Gaussian α-connection, 92Bivariate Gaussian α-curvature, 94

Bivariate Gaussian α-geometry, 91Bivariate Gaussian α-Ricci tensor, 95Bivariate Gaussian α-scalar curvature,

96Bivariate Gaussian affine immersion,

105Bivariate Gaussian correlation

coefficient, 89Bivariate Gaussian foliations, 98Bivariate Gaussian geodesic coordinates,

98Bivariate Gaussian information metric,

89Bivariate Gaussian marginals, 89Bivariate Gaussian mean curvatures, 97bivariate Gaussian metric, 89Bivariate Gaussian natural coordinates,

90Bivariate Gaussian potential, 90Bivariate Gaussian sectional curvatures,

96Bivariate Gaussian submanifold, 99bivariate log-gamma distribution, 71Bivariate log-Gaussian, 106Block and Basu ACBED, 84boundaries, 214bundle splitting, 27

canonical parameters, 33carbon fibre network, 161Cauchy-Schwartz inequality, 10cdf, 7cell culture scaffolds, 161censored Poisson, 14

247

Page 258: Information Geometry: Near Randomness and Near Independence

248 Index

Central Limit Theorem, 9, 116, 166central mean submanifold, 103change of random variables, 10change of variable, 7chaos, 11chaotic structure, 128chart, 20Chi-square distribution, 14Christoffel symbols, 27clumped, 175clumping, 168clustered structure, 128clustering, 148, 168clusters of galaxies, 121coefficient of variation, 2, 7coefficient of variation of counts, 132coefficient of variation of counts in cells,

127coiled-coils, 151commutator, 25complete sampling, 121connected components, 205connection flat, 28connection symmetric, 28connection torsion-free, 28constant scalar curvature, 96constant total fibre length, 182, 184constraints, 5contrast criterion, 208contrast parameter, 208convolution of exponential distribution,

14correlated polygon sides, 179, 182correlation, 210correlation analysis, 210correlation coefficient, 10, 216correlation functions, 125cosmic evolution, 132cosmological filaments, 119cosmological sheets, 119cosmological voids, 125cosmologicalvoids, 119count density, 126counts in cells, 120coupled process, 55covariance, 9, 56covariance zero, 10covariant derivative, 27, 28coverage, 164, 178

cross-section, 201cross-sectional area, 214cumulative distribution function, 7curvature, 28curvature zero, 70curve, 21

decision operator, 212decoding, 153decryption key, 153deepest valley, 213departure from random, 110departure from randomness, 131departures from Poisson, 126departures from random, 167dependence, 10derivation, 24deviations from random, 5diffeomorphic, 22diffeomorphism, 22differentiable, 21, 22Differential Power Analysis, 153digamma function, 186dilation, 204direct product, 99discriminate, 209discrimination, 154diseases, 151disorderly, 15disperse, 175dispersed structure, 128distance functions, 216distribution of galaxies, 120DPA, 153dual operation, 204

eccentricity, 181eigenvalues, 39, 102eigenvectors, 39eighbourhood of independence, 114eighbourhoods of uniformity, 109Einstein space, 101elementary geodesic dilations, 205encoding, 153encryption key, 153energy, 26energy of curve, 26energy-distance, 186entropy, 11, 12, 47

Page 259: Information Geometry: Near Randomness and Near Independence

Index 249

entropy difference, 154entropy for gamma density, 15entropy gradient field, 15entropy maximum, 15equiaffine, 36equivalent elliptical voids, 192equivalent pore, 174equivalent pore radius, 181equivalent radius, 174erosion of a set, 204Euler gamma constant, 187evolution, 151expectation, 6expected value, 2experimental data, 140exponential, 14exponential pdf, 8exponential connection, 35exponential distribution, 110, 142, 144,

146exponential family, 33, 47exponential marginal, 184, 188exponential type, 33exponentiation, 153extended maxima, 209extended minima, 209exterior derivative, 25exterior differentiation, 23extremal property, 29

fibre aspect ratio, 164fibre intersections, 178fibre length λ, 164fibre orientation, 170, 179fibre width ω, 164fibres of finite width, 164filter, 208finite width fibres, 164Fisher information metric, 31Fisher metric, 33, 37flat connection, 28flocculation, 168Freund α-connection, 77Freund α-curvature, 78Freund α-geometry, 77Freund α-mean curvature, 80Freund α-Ricci eigenvectors, 79Freund α-Ricci tensor, 79Freund α-sectional curvature, 80

Freund affine immersion, 87Freund correlation coefficient, 75Freund covariance, 75Freund distribution, 74Freund exponential family, 76Freund foliations, 80Freund geodesic coordinates, 81Freund includes ACBED, 81Freund independence submanifold, 81Freund information metric, 75, 76Freund log-exponential, 87Freund marginal, 74Freund natural coordinates, 76, 77Freund neighbourhood of independence,

112Freund potential, 76Freund scalar curvature, 79Freund submanifold, 81, 113

galactic wall filament, 121galaxies, 119galaxies: 40% in filaments, 124galaxy clusters and voids, 130galaxy counts in cells, 130gamma α-curvature, 39gamma α-Ricci tensor, 39gamma α-scalar curvature, 40gamma affine immersion, 110gamma alpha connection, 38gamma distribution, 12, 13, 128, 143,

173, 174gamma distribution characterized, 14gamma distribution discrete version,

126gamma entropy, 133gamma exponential family, 37gamma geodesic sprays, 135gamma geodesics, 135gamma manifold, 37gamma natural coordinates, 37gamma potential function, 37gamma probability density function, 13gamma uniqueness property, 14gaps, 171Gaussian, 9Gaussian curvature, 65Gaussian entropy, 47Gaussian Fisher metric, 47Gaussian manifold, 47

Page 260: Information Geometry: Near Randomness and Near Independence

250 Index

Gaussian orthogonal foliations, 48Gaussian truncated, 43Gaussian univariate, 45Gaussians neighbourhood of indepen-

dence, 114generalized gamma distribution, 225generalized McKay, 72generalized Poisson, 14genome, 139geodesic, 66geodesic curve, 29geodesic dilation, 204, 205geodesic distance, 204geodesic distance approximations, 145geodesic foliation, 146geodesic mesh, 146geodesic spray, 41geodesic sprays, 133glass fibre network, 161glass spheres, 213grains, 201graph immersion, 36greyscale, 205greyscale reconstruction, 207ground state correlation, 180, 181

h-extrema, 208h-maxima, 208h-maxima transform, 208h-minima, 208h-minima transform, 208Hessian, 37, 47, 60horizontal gamma geodesic, 136horizontal lift, 27horizontal subbundle, 27horizontal subspace, 27hydrology, 195hydrophobicity, 151

idempotence, 205identical marginals, 85, 101image analysis, 201, 216image histogram, 209image segmentation, 209increased contrast, 175independence submanifold, 99independent, 10independent polygon sides, 171induced connection, 36

inductive probes, 154information energy, 186information length, 145information metric, 31, 33, 219information source, 11information theory, 11inner product, 25inscribed circles, 164, 173inter-crossing intervals, 171inter-event distance, 9inter-event distances, 12inter-event spacing, 5inter-galactic voids, 125inter-incident distribution, 9inter-incident spacing, 5interconnecting pores, 203intersections of lines, 163isometric isomorph, 83, 107isometry, 26, 32, 43, 44, 72isotropic network, 180isotropic networks, 180isotropic random walk, 45

Jacobian, 8, 10joint probability density, 9joint probability distribution, 6

Kullback-Leibler distance, 199, 218, 219

large-scale structure in universe, 119Las Campanas Redshift Survey, 123Las Campanas Survey, 119least choice, 4least prejudiced, 12Levi-Civita connection, 27, 34, 35, 41,

66, 145Lie bracket, 25likelihood function, 13linear approximation, 20linear connection, 27linear dependence, 10local average coverage c, 164local density, 202local maximum, 207local maximum entropy, 189local minimum, 207local minimum entropy, 189locally adaptive, 209locally minimal paths, 145

Page 261: Information Geometry: Near Randomness and Near Independence

Index 251

log-gamma, 43log-gamma density function, 112log-gamma distribution, 133log-gamma geodesics, 171, 172log-likelihood, 15low contrast, 213lower bound, 212

m-form, 23magnitude range, 132manifold, 19marginal, 10marker image, 205, 207mask image, 205, 207Mathematica, 1maximally extended geodesics, 135maximum clustering, 131maximum entropy, 12, 15maximum likelihood, 13, 17, 216maximum likelihood estimates, 18, 38,

75maximum uncertainty, 4, 15McKay α-connection, 60McKay α-curvature, 61McKay α-Geometry, 60McKay α-mean curvatures, 64McKay α-Ricci tensor, 62McKay α-scalar curvature, 63McKay alpha-connections, 61McKay correlation coefficient, 56McKay distribution, 55, 181, 182, 197McKay entropy, 190McKay entropy critical points, 190McKay exponential type, 59McKay foliations, 64McKay geodesic, 66–68McKay geodesic coordinates, 65McKay geodesics, 190McKay information entropy, 188McKay information metric, 57, 59McKay marginals, 56McKay natural coordinates, 58, 60McKay orthogonal foliations, 64McKay potential, 60, 64McKay potential function, 59McKay simulation, 191, 218McKay simulator, 183, 191McKay simulators, 57McKay submanifold, 65

mean, 6, 7mean density along geodesic, 137mean eccentricity, 192mean events per interval, 9mean pore radius, 178metric connection, 35micrographs, 161Minimax, 211mixture coordinate syatem, 35mixture exponential, 74mixture exponential distribution, 55mixture of exponentials, 85modelling non-randomness, 167modular arithmetic, 153moment generating function, 2monolayer network, 178morphological operators, 203morphology, 200multiplanar networks, 176mutually dual, 35

n-manifold, 20nanofibre network, 161natural paraneters, 33near-regularity, 179nearby distributions, 154nearest neighbour spacing, 5negative binomial, 126neighbourhood of uniformity, 112neighbourhoods of independence, 109neighbouring voids, 214network uniformity, 193noisy, 213noisy samples, 154non-exponential family, 36non-Poisson, 9non-Poisson sequences, 142non-uniformity, 175nonlinear dependence, 10norm, 25Normal Distribution, 9

orthogonal, 38, 45, 154

paper fibre network, 161parallel transport, 28parallelizable, 22parameter, 8parametric statistical model, 8, 32

Page 262: Information Geometry: Near Randomness and Near Independence

252 Index

pdf, 6Pearson Type III distribution, 14perimeters, 163perturbations from random, 130perturbations of Poisson, 109phase function, 201pixel, 202pixel path, 202planar Poisson process, 120plane convex cells, 123Poisson, 14Poisson approximation, 142Poisson approximation to binomial, 5Poisson distribution, 4Poisson events, 171Poisson process, 4, 119, 125polygon limiting property, 179polygon side length, 175pore, 200pore area, 181, 216pore eccentricity, 181pore perimeter, 181pore radius, 175pore radius standard deviation, 178pores, 201pores as ellipses, 181pores connectivity, 214pores seem roundish, 180porometry, 174porosimetry, 174porosity, 178porous materials, 174porous media, 200positive covariance, 214positive definite, 25potential function, 33, 37power consumption, 153power traces, 154private key, 153probabilistic, 4, 109probability density function, 1, 6probability function, 2probability inequalities, 7profiles, 214

qualitative property, 150

radiographs, 175random, 1, 4, 11, 109, 161

random case, 174random fibre network, 161random line network, 161random lines, 161random networks, 175random points in cells, 122random polygon perimeters, 163random polygons, 163random process, 5random rectangles, 173random reference structure, 161random structure, 128random triangles, 163random variable, 1random variations, 180random walk, 45randomly spaced sequences, 141reconstruction, 205, 206redshift survey, 119reference model, 141region, 202regional maxima, 207regional minima, 207relative abundance, 141renewal process, 5residuals, 146Riemannian n-manifold, 25Riemannian connection, 35Riemannian distance, 185Riemannian isometry, 32Riemannian metric, 25, 32Riemannian submanifold, 26roundish polygons, 180roundish pores, 194RSA, 153

Saccharomyces cerevisiae, 139scalar curvature, 70scatterplots, 216secondary structure, 151secret sharing scheme, 154security, 157segmentation algorithm, 213segmented image, 215segmented regions, 215sequence length, 147shortest distance, 203shortest path, 204simulation, 218

Page 263: Information Geometry: Near Randomness and Near Independence

Index 253

simulator, 192slice, 202, 211smallest maximum error, 213smartcard, 154, 157smooth, 20smoothing, 146soil treatment data, 216soils, 197solid fraction, 201space of linear connections, 30spatial autocorrelation, 121spatial continuity, 210spheres, 216stable features, 140standard deviation, 2, 7standard deviation of eccentricities, 194standard random walk, 45stationarity, 139statistical difference, 154Statistical Zero-Knowledge, 154stochastic, 1strictly convex, 36structuring element, 204subadditive measure, 2submanifold, 26subsequence, 141sum of exponential random variables,

12sum of gamma random variables, 13summation convention, 24, 26, 32survey data from 2dFGRS, 125symmetric connection, 28SZK, 154

tangent, 21tangent bundle, 22tangent map, 21, 27tangent space, 20tangent vector, 21thin networks, 181threshold, 205threshold decomposition, 206threshold level, 208threshold size for voids, 128throat, 201, 202throat area, 216

timing, 153tissue engineering, 161tomographic images, 200torsion, 28torsion-free connection, 28tortuosity, 203total disorder, 11transformation, 205transient behaviour, 151transversal connection form, 36truncated Gaussian, 43, 154tubular neighbourhood, 42, 111tubular neighbourhood of independence,

115

uncertainty, 11, 15underdense probability, 126underdense regions, 125uniform density function, 111uniform distribution, 43, 154uniform distribution of angles, 161uniform pdf, 8uniform probability, 141unimodular distributions, 155uniqueness property, 14, 173unit weight, 2universal connection, 29universal self-clustering, 150upper bound on distances, 146, 185

variance, 2, 7vector field, 26vertical subbundle, 27vertical subspace, 27void fraction, 201void probability function, 125void radii distribution, 128void segmentation, 210void size statistics, 126void survey data from 2dFGRS, 128Volume Limited Catalogue, 132

Weibull distribution, 36Whitney embedding theorem, 21

zero covariance, 10Zero-Knowledge proofs, 154

Page 264: Information Geometry: Near Randomness and Near Independence

Lecture Notes in MathematicsFor information about earlier volumesplease contact your bookseller or SpringerLNM Online archive: springerlink.com

Vol. 1774: V. Runde, Lectures on Amenability (2002)Vol. 1775: W. H. Meeks, A. Ros, H. Rosenberg, TheGlobal Theory of Minimal Surfaces in Flat Spaces.Martina Franca 1999. Editor: G. P. Pirola (2002)Vol. 1776: K. Behrend, C. Gomez, V. Tarasov, G. Tian,Quantum Comohology. Cetraro 1997. Editors: P. de Bar-tolomeis, B. Dubrovin, C. Reina (2002)Vol. 1777: E. García-Río, D. N. Kupeli, R. Vázquez-Lorenzo, Osserman Manifolds in Semi-RiemannianGeometry (2002)Vol. 1778: H. Kiechle, Theory of K-Loops (2002)Vol. 1779: I. Chueshov, Monotone Random Systems(2002)Vol. 1780: J. H. Bruinier, Borcherds Products on O(2,1)and Chern Classes of Heegner Divisors (2002)Vol. 1781: E. Bolthausen, E. Perkins, A. van der Vaart,Lectures on Probability Theory and Statistics. Ecole d’Eté de Probabilités de Saint-Flour XXIX-1999. Editor:P. Bernard (2002)Vol. 1782: C.-H. Chu, A. T.-M. Lau, Harmonic Functionson Groups and Fourier Algebras (2002)Vol. 1783: L. Grüne, Asymptotic Behavior of Dynamicaland Control Systems under Perturbation and Discretiza-tion (2002)Vol. 1784: L. H. Eliasson, S. B. Kuksin, S. Marmi, J.-C.Yoccoz, Dynamical Systems and Small Divisors. Cetraro,Italy 1998. Editors: S. Marmi, J.-C. Yoccoz (2002)Vol. 1785: J. Arias de Reyna, Pointwise Convergence ofFourier Series (2002)Vol. 1786: S. D. Cutkosky, Monomialization of Mor-phisms from 3-Folds to Surfaces (2002)Vol. 1787: S. Caenepeel, G. Militaru, S. Zhu, Frobeniusand Separable Functors for Generalized Module Cate-gories and Nonlinear Equations (2002)Vol. 1788: A. Vasil’ev, Moduli of Families of Curves forConformal and Quasiconformal Mappings (2002)Vol. 1789: Y. Sommerhäuser, Yetter-Drinfel’d Hopf alge-bras over groups of prime order (2002)Vol. 1790: X. Zhan, Matrix Inequalities (2002)Vol. 1791: M. Knebusch, D. Zhang, Manis Valuationsand Prüfer Extensions I: A new Chapter in CommutativeAlgebra (2002)Vol. 1792: D. D. Ang, R. Gorenflo, V. K. Le, D. D. Trong,Moment Theory and Some Inverse Problems in PotentialTheory and Heat Conduction (2002)Vol. 1793: J. Cortés Monforte, Geometric, Control andNumerical Aspects of Nonholonomic Systems (2002)Vol. 1794: N. Pytheas Fogg, Substitution in Dynamics,Arithmetics and Combinatorics. Editors: V. Berthé, S.Ferenczi, C. Mauduit, A. Siegel (2002)Vol. 1795: H. Li, Filtered-Graded Transfer in Using Non-commutative Gröbner Bases (2002)Vol. 1796: J.M. Melenk, hp-Finite Element Methods forSingular Perturbations (2002)

Vol. 1797: B. Schmidt, Characters and Cyclotomic Fieldsin Finite Geometry (2002)Vol. 1798: W.M. Oliva, Geometric Mechanics (2002)Vol. 1799: H. Pajot, Analytic Capacity, Rectifiability,Menger Curvature and the Cauchy Integral (2002)Vol. 1800: O. Gabber, L. Ramero, Almost Ring Theory(2003)Vol. 1801: J. Azéma, M. Émery, M. Ledoux, M. Yor(Eds.), Séminaire de Probabilités XXXVI (2003)Vol. 1802: V. Capasso, E. Merzbach, B. G. Ivanoff,M. Dozzi, R. Dalang, T. Mountford, Topics in SpatialStochastic Processes. Martina Franca, Italy 2001. Editor:E. Merzbach (2003)Vol. 1803: G. Dolzmann, Variational Methods for Crys-talline Microstructure – Analysis and Computation(2003)Vol. 1804: I. Cherednik, Ya. Markov, R. Howe, G.Lusztig, Iwahori-Hecke Algebras and their Representa-tion Theory. Martina Franca, Italy 1999. Editors: V. Bal-doni, D. Barbasch (2003)Vol. 1805: F. Cao, Geometric Curve Evolution and ImageProcessing (2003)Vol. 1806: H. Broer, I. Hoveijn. G. Lunther, G. Vegter,Bifurcations in Hamiltonian Systems. Computing Singu-larities by Gröbner Bases (2003)Vol. 1807: V. D. Milman, G. Schechtman (Eds.), Geomet-ric Aspects of Functional Analysis. Israel Seminar 2000-2002 (2003)Vol. 1808: W. Schindler, Measures with Symmetry Prop-erties (2003)Vol. 1809: O. Steinbach, Stability Estimates for HybridCoupled Domain Decomposition Methods (2003)Vol. 1810: J. Wengenroth, Derived Functors in FunctionalAnalysis (2003)Vol. 1811: J. Stevens, Deformations of Singularities(2003)Vol. 1812: L. Ambrosio, K. Deckelnick, G. Dziuk,M. Mimura, V. A. Solonnikov, H. M. Soner, Mathemat-ical Aspects of Evolving Interfaces. Madeira, Funchal,Portugal 2000. Editors: P. Colli, J. F. Rodrigues (2003)Vol. 1813: L. Ambrosio, L. A. Caffarelli, Y. Brenier,G. Buttazzo, C. Villani, Optimal Transportation and itsApplications. Martina Franca, Italy 2001. Editors: L. A.Caffarelli, S. Salsa (2003)Vol. 1814: P. Bank, F. Baudoin, H. Föllmer, L.C.G.Rogers, M. Soner, N. Touzi, Paris-Princeton Lectures onMathematical Finance 2002 (2003)Vol. 1815: A. M. Vershik (Ed.), Asymptotic Combi-natorics with Applications to Mathematical Physics.St. Petersburg, Russia 2001 (2003)Vol. 1816: S. Albeverio, W. Schachermayer, M. Tala-grand, Lectures on Probability Theory and Statistics.Ecole d’Eté de Probabilités de Saint-Flour XXX-2000.Editor: P. Bernard (2003)

Page 265: Information Geometry: Near Randomness and Near Independence

Vol. 1817: E. Koelink, W. Van Assche (Eds.), OrthogonalPolynomials and Special Functions. Leuven 2002 (2003)Vol. 1818: M. Bildhauer, Convex Variational Problemswith Linear, nearly Linear and/or Anisotropic GrowthConditions (2003)Vol. 1819: D. Masser, Yu. V. Nesterenko, H. P. Schlick-ewei, W. M. Schmidt, M. Waldschmidt, DiophantineApproximation. Cetraro, Italy 2000. Editors: F. Amoroso,U. Zannier (2003)Vol. 1820: F. Hiai, H. Kosaki, Means of Hilbert SpaceOperators (2003)Vol. 1821: S. Teufel, Adiabatic Perturbation Theory inQuantum Dynamics (2003)Vol. 1822: S.-N. Chow, R. Conti, R. Johnson, J. Mallet-Paret, R. Nussbaum, Dynamical Systems. Cetraro, Italy2000. Editors: J. W. Macki, P. Zecca (2003)Vol. 1823: A. M. Anile, W. Allegretto, C. Ring-hofer, Mathematical Problems in Semiconductor Physics.Cetraro, Italy 1998. Editor: A. M. Anile (2003)Vol. 1824: J. A. Navarro González, J. B. Sancho de Salas,C ∞ – Differentiable Spaces (2003)Vol. 1825: J. H. Bramble, A. Cohen, W. Dahmen, Mul-tiscale Problems and Methods in Numerical Simulations,Martina Franca, Italy 2001. Editor: C. Canuto (2003)Vol. 1826: K. Dohmen, Improved Bonferroni Inequal-ities via Abstract Tubes. Inequalities and Identities ofInclusion-Exclusion Type. VIII, 113 p, 2003.Vol. 1827: K. M. Pilgrim, Combinations of ComplexDynamical Systems. IX, 118 p, 2003.Vol. 1828: D. J. Green, Gröbner Bases and the Computa-tion of Group Cohomology. XII, 138 p, 2003.Vol. 1829: E. Altman, B. Gaujal, A. Hordijk, Discrete-Event Control of Stochastic Networks: Multimodularityand Regularity. XIV, 313 p, 2003.Vol. 1830: M. I. Gil’, Operator Functions and Localiza-tion of Spectra. XIV, 256 p, 2003.Vol. 1831: A. Connes, J. Cuntz, E. Guentner, N. Hig-son, J. E. Kaminker, Noncommutative Geometry, Mar-tina Franca, Italy 2002. Editors: S. Doplicher, L. Longo(2004)Vol. 1832: J. Azéma, M. Émery, M. Ledoux, M. Yor(Eds.), Séminaire de Probabilités XXXVII (2003)Vol. 1833: D.-Q. Jiang, M. Qian, M.-P. Qian, Mathemati-cal Theory of Nonequilibrium Steady States. On the Fron-tier of Probability and Dynamical Systems. IX, 280 p,2004.Vol. 1834: Yo. Yomdin, G. Comte, Tame Geometry withApplication in Smooth Analysis. VIII, 186 p, 2004.Vol. 1835: O.T. Izhboldin, B. Kahn, N.A. Karpenko,A. Vishik, Geometric Methods in the Algebraic Theoryof Quadratic Forms. Summer School, Lens, 2000. Editor:J.-P. Tignol (2004)Vol. 1836: C. Nastasescu, F. Van Oystaeyen, Methods ofGraded Rings. XIII, 304 p, 2004.Vol. 1837: S. Tavaré, O. Zeitouni, Lectures on Probabil-ity Theory and Statistics. Ecole d’Eté de Probabilités deSaint-Flour XXXI-2001. Editor: J. Picard (2004)Vol. 1838: A.J. Ganesh, N.W. O’Connell, D.J. Wischik,Big Queues. XII, 254 p, 2004.Vol. 1839: R. Gohm, Noncommutative StationaryProcesses. VIII, 170 p, 2004.Vol. 1840: B. Tsirelson, W. Werner, Lectures on Probabil-ity Theory and Statistics. Ecole d’Eté de Probabilités deSaint-Flour XXXII-2002. Editor: J. Picard (2004)Vol. 1841: W. Reichel, Uniqueness Theorems for Vari-ational Problems by the Method of TransformationGroups (2004)

Vol. 1842: T. Johnsen, A. L. Knutsen, K3 Projective Mod-els in Scrolls (2004)Vol. 1843: B. Jefferies, Spectral Properties of Noncom-muting Operators (2004)Vol. 1844: K.F. Siburg, The Principle of Least Action inGeometry and Dynamics (2004)Vol. 1845: Min Ho Lee, Mixed Automorphic Forms,Torus Bundles, and Jacobi Forms (2004)Vol. 1846: H. Ammari, H. Kang, Reconstruction of SmallInhomogeneities from Boundary Measurements (2004)Vol. 1847: T.R. Bielecki, T. Björk, M. Jeanblanc, M.Rutkowski, J.A. Scheinkman, W. Xiong, Paris-PrincetonLectures on Mathematical Finance 2003 (2004)Vol. 1848: M. Abate, J. E. Fornaess, X. Huang, J. P.Rosay, A. Tumanov, Real Methods in Complex and CRGeometry, Martina Franca, Italy 2002. Editors: D. Zait-sev, G. Zampieri (2004)Vol. 1849: Martin L. Brown, Heegner Modules and Ellip-tic Curves (2004)Vol. 1850: V. D. Milman, G. Schechtman (Eds.), Geomet-ric Aspects of Functional Analysis. Israel Seminar 2002-2003 (2004)Vol. 1851: O. Catoni, Statistical Learning Theory andStochastic Optimization (2004)Vol. 1852: A.S. Kechris, B.D. Miller, Topics in OrbitEquivalence (2004)Vol. 1853: Ch. Favre, M. Jonsson, The Valuative Tree(2004)Vol. 1854: O. Saeki, Topology of Singular Fibers of Dif-ferential Maps (2004)Vol. 1855: G. Da Prato, P.C. Kunstmann, I. Lasiecka,A. Lunardi, R. Schnaubelt, L. Weis, Functional AnalyticMethods for Evolution Equations. Editors: M. Iannelli,R. Nagel, S. Piazzera (2004)Vol. 1856: K. Back, T.R. Bielecki, C. Hipp, S. Peng,W. Schachermayer, Stochastic Methods in Finance, Bres-sanone/Brixen, Italy, 2003. Editors: M. Fritelli, W. Rung-galdier (2004)Vol. 1857: M. Émery, M. Ledoux, M. Yor (Eds.), Sémi-naire de Probabilités XXXVIII (2005)Vol. 1858: A.S. Cherny, H.-J. Engelbert, Singular Sto-chastic Differential Equations (2005)Vol. 1859: E. Letellier, Fourier Transforms of InvariantFunctions on Finite Reductive Lie Algebras (2005)Vol. 1860: A. Borisyuk, G.B. Ermentrout, A. Friedman,D. Terman, Tutorials in Mathematical Biosciences I.Mathematical Neurosciences (2005)Vol. 1861: G. Benettin, J. Henrard, S. Kuksin, Hamil-tonian Dynamics – Theory and Applications, Cetraro,Italy, 1999. Editor: A. Giorgilli (2005)Vol. 1862: B. Helffer, F. Nier, Hypoelliptic Estimates andSpectral Theory for Fokker-Planck Operators and WittenLaplacians (2005)Vol. 1863: H. Führ, Abstract Harmonic Analysis of Con-tinuous Wavelet Transforms (2005)Vol. 1864: K. Efstathiou, Metamorphoses of HamiltonianSystems with Symmetries (2005)Vol. 1865: D. Applebaum, B.V. R. Bhat, J. Kustermans,J. M. Lindsay, Quantum Independent IncrementProcesses I. From Classical Probability to QuantumStochastic Calculus. Editors: M. Schürmann, U. Franz(2005)Vol. 1866: O.E. Barndorff-Nielsen, U. Franz, R. Gohm,B. Kümmerer, S. Thorbjønsen, Quantum IndependentIncrement Processes II. Structure of Quantum LévyProcesses, Classical Probability, and Physics. Editors: M.Schürmann, U. Franz, (2005)

Page 266: Information Geometry: Near Randomness and Near Independence

Vol. 1867: J. Sneyd (Ed.), Tutorials in Mathematical Bio-sciences II. Mathematical Modeling of Calcium Dynam-ics and Signal Transduction. (2005)Vol. 1868: J. Jorgenson, S. Lang, Posn(R) and EisensteinSeries. (2005)Vol. 1869: A. Dembo, T. Funaki, Lectures on Probabil-ity Theory and Statistics. Ecole d’Eté de Probabilités deSaint-Flour XXXIII-2003. Editor: J. Picard (2005)Vol. 1870: V.I. Gurariy, W. Lusky, Geometry of MüntzSpaces and Related Questions. (2005)Vol. 1871: P. Constantin, G. Gallavotti, A.V. Kazhikhov,Y. Meyer, S. Ukai, Mathematical Foundation of Turbu-lent Viscous Flows, Martina Franca, Italy, 2003. Editors:M. Cannone, T. Miyakawa (2006)Vol. 1872: A. Friedman (Ed.), Tutorials in Mathemati-cal Biosciences III. Cell Cycle, Proliferation, and Cancer(2006)Vol. 1873: R. Mansuy, M. Yor, Random Times and En-largements of Filtrations in a Brownian Setting (2006)Vol. 1874: M. Yor, M. Émery (Eds.), In Memoriam Paul-André Meyer - Séminaire de Probabilités XXXIX (2006)Vol. 1875: J. Pitman, Combinatorial Stochastic Processes.Ecole d’Eté de Probabilités de Saint-Flour XXXII-2002.Editor: J. Picard (2006)Vol. 1876: H. Herrlich, Axiom of Choice (2006)Vol. 1877: J. Steuding, Value Distributions of L-Functions(2007)Vol. 1878: R. Cerf, The Wulff Crystal in Ising and Percol-ation Models, Ecole d’Eté de Probabilités de Saint-FlourXXXIV-2004. Editor: Jean Picard (2006)Vol. 1879: G. Slade, The Lace Expansion and its Applica-tions, Ecole d’Eté de Probabilités de Saint-Flour XXXIV-2004. Editor: Jean Picard (2006)Vol. 1880: S. Attal, A. Joye, C.-A. Pillet, Open QuantumSystems I, The Hamiltonian Approach (2006)Vol. 1881: S. Attal, A. Joye, C.-A. Pillet, Open QuantumSystems II, The Markovian Approach (2006)Vol. 1882: S. Attal, A. Joye, C.-A. Pillet, Open QuantumSystems III, Recent Developments (2006)Vol. 1883: W. Van Assche, F. Marcellàn (Eds.), Orthogo-nal Polynomials and Special Functions, Computation andApplication (2006)Vol. 1884: N. Hayashi, E.I. Kaikina, P.I. Naumkin,I.A. Shishmarev, Asymptotics for Dissipative NonlinearEquations (2006)Vol. 1885: A. Telcs, The Art of Random Walks (2006)Vol. 1886: S. Takamura, Splitting Deformations of Dege-nerations of Complex Curves (2006)Vol. 1887: K. Habermann, L. Habermann, Introduction toSymplectic Dirac Operators (2006)Vol. 1888: J. van der Hoeven, Transseries and Real Dif-ferential Algebra (2006)Vol. 1889: G. Osipenko, Dynamical Systems, Graphs, andAlgorithms (2006)Vol. 1890: M. Bunge, J. Funk, Singular Coverings ofToposes (2006)Vol. 1891: J.B. Friedlander, D.R. Heath-Brown,H. Iwaniec, J. Kaczorowski, Analytic Number Theory,Cetraro, Italy, 2002. Editors: A. Perelli, C. Viola (2006)Vol. 1892: A. Baddeley, I. Bárány, R. Schneider, W. Weil,Stochastic Geometry, Martina Franca, Italy, 2004. Editor:W. Weil (2007)Vol. 1893: H. Hanßmann, Local and Semi-Local Bifur-cations in Hamiltonian Dynamical Systems, Results andExamples (2007)Vol. 1894: C.W. Groetsch, Stable Approximate Evalua-tion of Unbounded Operators (2007)

Vol. 1895: L. Molnár, Selected Preserver Problems onAlgebraic Structures of Linear Operators and on FunctionSpaces (2007)Vol. 1896: P. Massart, Concentration Inequalities andModel Selection, Ecole d’Été de Probabilités de Saint-Flour XXXIII-2003. Editor: J. Picard (2007)Vol. 1897: R. Doney, Fluctuation Theory for LévyProcesses, Ecole d’Été de Probabilités de Saint-FlourXXXV-2005. Editor: J. Picard (2007)Vol. 1898: H.R. Beyer, Beyond Partial Differential Equa-tions, On linear and Quasi-Linear Abstract HyperbolicEvolution Equations (2007)Vol. 1899: Séminaire de Probabilités XL. Editors:C. Donati-Martin, M. Émery, A. Rouault, C. Stricker(2007)Vol. 1900: E. Bolthausen, A. Bovier (Eds.), Spin Glasses(2007)Vol. 1901: O. Wittenberg, Intersections de deuxquadriques et pinceaux de courbes de genre 1, Intersec-tions of Two Quadrics and Pencils of Curves of Genus 1(2007)Vol. 1902: A. Isaev, Lectures on the AutomorphismGroups of Kobayashi-Hyperbolic Manifolds (2007)Vol. 1903: G. Kresin, V. Maz’ya, Sharp Real-Part Theo-rems (2007)Vol. 1904: P. Giesl, Construction of Global LyapunovFunctions Using Radial Basis Functions (2007)Vol. 1905: C. Prévot, M. Röckner, A Concise Course onStochastic Partial Differential Equations (2007)Vol. 1906: T. Schuster, The Method of ApproximateInverse: Theory and Applications (2007)Vol. 1907: M. Rasmussen, Attractivity and Bifurcationfor Nonautonomous Dynamical Systems (2007)Vol. 1908: T.J. Lyons, M. Caruana, T. Lévy, DifferentialEquations Driven by Rough Paths, Ecole d’Été de Proba-bilités de Saint-Flour XXXIV-2004 (2007)Vol. 1909: H. Akiyoshi, M. Sakuma, M. Wada,Y. Yamashita, Punctured Torus Groups and 2-BridgeKnot Groups (I) (2007)Vol. 1910: V.D. Milman, G. Schechtman (Eds.), Geo-metric Aspects of Functional Analysis. Israel Seminar2004-2005 (2007)Vol. 1911: A. Bressan, D. Serre, M. Williams,K. Zumbrun, Hyperbolic Systems of Balance Laws.Cetraro, Italy 2003. Editor: P. Marcati (2007)Vol. 1912: V. Berinde, Iterative Approximation of FixedPoints (2007)Vol. 1913: J.E. Marsden, G. Misiołek, J.-P. Ortega,M. Perlmutter, T.S. Ratiu, Hamiltonian Reduction byStages (2007)Vol. 1914: G. Kutyniok, Affine Density in WaveletAnalysis (2007)Vol. 1915: T. Bıyıkoglu, J. Leydold, P.F. Stadler,Laplacian Eigenvectors of Graphs. Perron-Frobenius andFaber-Krahn Type Theorems (2007)Vol. 1916: C. Villani, F. Rezakhanlou, Entropy Methodsfor the Boltzmann Equation. Editors: F. Golse, S. Olla(2008)Vol. 1917: I. Veselic, Existence and Regularity Prop-erties of the Integrated Density of States of RandomSchrödinger (2008)Vol. 1918: B. Roberts, R. Schmidt, Local Newforms forGSp(4) (2007)Vol. 1919: R.A. Carmona, I. Ekeland, A. Kohatsu-Higa, J.-M. Lasry, P.-L. Lions, H. Pham, E. Taflin,Paris-Princeton Lectures on Mathematical Finance 2004.

Page 267: Information Geometry: Near Randomness and Near Independence

Editors: R.A. Carmona, E. Çinlar, I. Ekeland, E. Jouini,J.A. Scheinkman, N. Touzi (2007)Vol. 1920: S.N. Evans, Probability and Real Trees. Ecoled’Été de Probabilités de Saint-Flour XXXV-2005 (2008)Vol. 1921: J.P. Tian, Evolution Algebras and their Appli-cations (2008)Vol. 1922: A. Friedman (Ed.), Tutorials in MathematicalBioSciences IV. Evolution and Ecology (2008)Vol. 1923: J.P.N. Bishwal, Parameter Estimation inStochastic Differential Equations (2008)Vol. 1924: M. Wilson, Littlewood-Paley Theory andExponential-Square Integrability (2008)Vol. 1925: M. du Sautoy, L. Woodward, Zeta Functionsof Groups and Rings (2008)Vol. 1926: L. Barreira, V. Claudia, Stability of Nonauto-nomous Differential Equations (2008)Vol. 1927: L. Ambrosio, L. Caffarelli, M.G. Crandall,L.C. Evans, N. Fusco, Calculus of Variations and Non-Linear Partial Differential Equations. Cetraro, Italy 2005.Editors: B. Dacorogna, P. Marcellini (2008)Vol. 1928: J. Jonsson, Simplicial Complexes of Graphs(2008)Vol. 1929: Y. Mishura, Stochastic Calculus for FractionalBrownian Motion and Related Processes (2008)Vol. 1930: J.M. Urbano, The Method of Intrinsic Scaling.A Systematic Approach to Regularity for Degenerate andSingular PDEs (2008)Vol. 1931: M. Cowling, E. Frenkel, M. Kashiwara,A. Valette, D.A. Vogan, Jr., N.R. Wallach, RepresentationTheory and Complex Analysis. Venice, Italy 2004.Editors: E.C. Tarabusi, A. D’Agnolo, M. Picardello(2008)Vol. 1932: A.A. Agrachev, A.S. Morse, E.D. Sontag,H.J. Sussmann, V.I. Utkin, Nonlinear and Optimal Con-trol Theory. Cetraro, Italy 2004. Editors: P. Nistri, G. Ste-fani (2008)Vol. 1933: M. Petkovic, Point Estimation of Root FindingMethods (2008)Vol. 1934: C. Donati-Martin, M. Émery, A. Rouault,C. Stricker (Eds.), Séminaire de Probabilités XLI (2008)Vol. 1935: A. Unterberger, Alternative PseudodifferentialAnalysis (2008)Vol. 1936: P. Magal, S. Ruan (Eds.), Structured Popula-tion Models in Biology and Epidemiology (2008)Vol. 1937: G. Capriz, P. Giovine, P.M. Mariano (Eds.),Mathematical Models of Granular Matter (2008)Vol. 1938: D. Auroux, F. Catanese, M. Manetti, P. Seidel,B. Siebert, I. Smith, G. Tian, Symplectic 4-Manifoldsand Algebraic Surfaces. Cetraro, Italy 2003. Editors:F. Catanese, G. Tian (2008)Vol. 1939: D. Boffi, F. Brezzi, L. Demkowicz, R.G.Durán, R.S. Falk, M. Fortin, Mixed Finite Elements,Compatibility Conditions, and Applications. Cetraro,Italy 2006. Editors: D. Boffi, L. Gastaldi (2008)Vol. 1940: J. Banasiak, V. Capasso, M.A.J. Chap-lain, M. Lachowicz, J. Miekisz, Multiscale Problems inthe Life Sciences. From Microscopic to Macroscopic.Bedlewo, Poland 2006. Editors: V. Capasso, M. Lachow-icz (2008)Vol. 1941: S.M.J. Haran, Arithmetical Investigations.Representation Theory, Orthogonal Polynomials, andQuantum Interpolations (2008)Vol. 1942: S. Albeverio, F. Flandoli, Y.G. Sinai, SPDE inHydrodynamic. Recent Progress and Prospects. Cetraro,Italy 2005. Editors: G. Da Prato, M. Röckner (2008)Vol. 1943: L.L. Bonilla (Ed.), Inverse Problems and Imag-ing. Martina Franca, Italy 2002 (2008)

Vol. 1944: A. Di Bartolo, G. Falcone, P. Plaumann,K. Strambach, Algebraic Groups and Lie Groups withFew Factors (2008)Vol. 1945: F. Brauer, P. van den Driessche, J. Wu (Eds.),Mathematical Epidemiology (2008)Vol. 1946: G. Allaire, A. Arnold, P. Degond, T.Y. Hou,Quantum Transport. Modelling, Analysis and Asymptot-ics. Cetraro, Italy 2006. Editors: N.B. Abdallah, G. Fros-ali (2008)Vol. 1947: D. Abramovich, M. Marino, M. Thaddeus,R. Vakil, Enumerative Invariants in Algebraic Geo-metry and String Theory. Cetraro, Italy 2005. Editors:K. Behrend, M. Manetti (2008)Vol. 1948: F. Cao, J-L. Lisani, J-M. Morel, P. Musé,F. Sur, A Theory of Shape Identification (2008)Vol. 1949: H.G. Feichtinger, B. Helffer, M.P. Lamoureux,N. Lerner, J. Toft, Pseudo-Differential Operators. Quan-tization and Signals. Cetraro, Italy 2006. Editors: L.Rodino, M.W. Wong (2008)Vol. 1950: M. Bramson, Stability of Queueing Networks,Ecole d’Eté de Probabilités de Saint-Flour XXXVI-2006(2008)Vol. 1951: A. Moltó, J. Orihuela, S. Troyanski,M. Valdivia, A Non Linear Transfer Technique forRenorming (2008)Vol. 1952: R. Mikhailov, I.B.S. Passi, Lower Central andDimension Series of Groups (2008)Vol. 1953: K. Arwini, C.T.J. Dodson, Information Geo-metry (2008)Vol. 1954: P. Biane, L. Bouten, F. Cipriani, N. Konno,N. Privault, Q. Xu, Quantum Potential Theory. Editors:U. Franz, M. Schuermann (2008)Vol. 1955: M. Bernot, V. Caselles, J.-M. Morel, Optimaltransportation networks (2008)Vol. 1956: C.H. Chu, Matrix Convolution Operators onGroups (2008)Vol. 1957: A. Guionnet, On Random Matrices: Macro-scopic Asymptotics, Ecole d’Eté de Probabilités de Saint-Flour XXXVI-2006 (2008)Vol. 1958: M.C. Olsson, Compactifying Moduli Spacesfor Abelian Varieties (2008)

Recent Reprints and New EditionsVol. 1702: J. Ma, J. Yong, Forward-Backward Stochas-tic Differential Equations and their Applications. 1999 –Corr. 3rd printing (2007)Vol. 830: J.A. Green, Polynomial Representations ofGLn, with an Appendix on Schensted Correspondenceand Littelmann Paths by K. Erdmann, J.A. Green andM. Schoker 1980 – 2nd corr. and augmented edition(2007)Vol. 1693: S. Simons, From Hahn-Banach to Monotonic-ity (Minimax and Monotonicity 1998) – 2nd exp. edition(2008)Vol. 470: R.E. Bowen, Equilibrium States and the ErgodicTheory of Anosov Diffeomorphisms. With a preface byD. Ruelle. Edited by J.-R. Chazottes. 1975 – 2nd rev.edition (2008)Vol. 523: S.A. Albeverio, R.J. Høegh-Krohn, S. Maz-zucchi, Mathematical Theory of Feynman Path Integral.1976 – 2nd corr. and enlarged edition (2008)Vol. 1764: A. Cannas da Silva, Lectures on SymplecticGeometry 2001 – Corr. 2nd printing (2008)

Page 268: Information Geometry: Near Randomness and Near Independence

LECTURE NOTES IN MATHEMATICS 123Edited by J.-M. Morel, F. Takens, B. Teissier, P.K. Maini

Editorial Policy (for the publication of monographs)

1. Lecture Notes aim to report new developments in all areas of mathematics and theirapplications - quickly, informally and at a high level. Mathematical texts analysing newdevelopments in modelling and numerical simulation are welcome.

Monograph manuscripts should be reasonably self-contained and rounded off. Thusthey may, and often will, present not only results of the author but also related workby other people. They may be based on specialised lecture courses. Furthermore, themanuscripts should provide sufficient motivation, examples and applications. This clearlydistinguishes Lecture Notes from journal articles or technical reports which normally arevery concise. Articles intended for a journal but too long to be accepted by most journals,usually do not have this “lecture notes” character. For similar reasons it is unusual fordoctoral theses to be accepted for the Lecture Notes series, though habilitation theses maybe appropriate.

2. Manuscripts should be submitted either to Springer’s mathematics editorial in Heidelberg,or to one of the series editors. In general, manuscripts will be sent out to 2 external refereesfor evaluation. If a decision cannot yet be reached on the basis of the first 2 reports, furtherreferees may be contacted: The author will be informed of this. A final decision to publishcan be made only on the basis of the complete manuscript, however a refereeing processleading to a preliminary decision can be based on a pre-final or incomplete manuscript.The strict minimum amount of material that will be considered should include a detailedoutline describing the planned contents of each chapter, a bibliography and several samplechapters.

Authors should be aware that incomplete or insufficiently close to final manuscriptsalmost always result in longer refereeing times and nevertheless unclear referees’ recom-mendations, making further refereeing of a final draft necessary.

Authors should also be aware that parallel submission of their manuscript to anotherpublisher while under consideration for LNM will in general lead to immediate rejection.

3. Manuscripts should in general be submitted in English. Final manuscripts should containat least 100 pages of mathematical text and should always include

– a table of contents;– an informative introduction, with adequate motivation and perhaps some historical re-

marks: it should be accessible to a reader not intimately familiar with the topic treated;– a subject index: as a rule this is genuinely helpful for the reader.

For evaluation purposes, manuscripts may be submitted in print or electronic form, inthe latter case preferably as pdf- or zipped ps-files. Lecture Notes volumes are, as a rule,printed digitally from the authors’ files. To ensure best results, authors are asked to usethe LaTeX2e style files available from Springer’s web-server at:

ftp://ftp.springer.de/pub/tex/latex/svmonot1/ (for monographs).

Page 269: Information Geometry: Near Randomness and Near Independence

Additional technical instructions, if necessary, are available on request from:[email protected].

4. Careful preparation of the manuscripts will help keep production time short besides en-suring satisfactory appearance of the finished book in print and online. After acceptanceof the manuscript authors will be asked to prepare the final LaTeX source files (and alsothe corresponding dvi-, pdf- or zipped ps-file) together with the final printout made fromthese files. The LaTeX source files are essential for producing the full-text online versionof the book (see www.springerlink.com/content/110312 for the existing online volumesof LNM).

The actual production of a Lecture Notes volume takes approximately 12 weeks.

5. Authors receive a total of 50 free copies of their volume, but no royalties. They are entitledto a discount of 33.3% on the price of Springer books purchased for their personal use, ifordering directly from Springer.

6. Commitment to publish is made by letter of intent rather than by signing a formal contract.Springer-Verlag secures the copyright for each volume. Authors are free to reuse materialcontained in their LNM volumes in later publications: a brief written (or e-mail) requestfor formal permission is sufficient.

Addresses:Professor J.-M. Morel, CMLA,Ecole Normale Superieure de Cachan,61 Avenue du President Wilson, 94235 Cachan Cedex, FranceE-mail: [email protected]

Professor F. Takens, Mathematisch Instituut,Rijksuniversiteit Groningen, Postbus 800,9700 AV Groningen, The NetherlandsE-mail: [email protected]

Professor B. Teissier, Institut Mathematique de Jussieu,UMR 7586 du CNRS, Equipe “Geometrie et Dynamique”,175 rue du Chevaleret75013 Paris, FranceE-mail: [email protected]

For the “Mathematical Biosciences Subseries” of LNM:

Professor P.K. Maini, Center for Mathematical Biology,Mathematical Institute, 24-29 St Giles,Oxford OX1 3LP, UKE-mail: [email protected]

Springer, Mathematics Editorial I, Tiergartenstr. 1769121 Heidelberg, Germany,Tel.: +49 (6221) 487-8259Fax: +49 (6221) 4876-8259E-mail: [email protected]