Information-Theoretic Methods in Data Science
Transcript of Information-Theoretic Methods in Data Science
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Information-Theoretic Methods in Data Science
Learn about the state-of-the-art at the interface between information theory and data
science with this first unified treatment of the subject. Written by leading experts
in a clear, tutorial style, and using consistent notation and definitions throughout, it
shows how information-theoretic methods are being used in data acquisition, data
representation, data analysis, and statistics and machine learning.
Coverage is broad, with chapters on signal data acquisition, data compression,
compressive sensing, data communication, representation learning, emerging topics in
statistics, and much more. Each chapter includes a topic overview, definition of the key
problems, emerging and open problems, and an extensive reference list, allowing readers
to develop in-depth knowledge and understanding.
Providing a thorough survey of the current research area and cutting-edge trends, this
is essential reading for graduate students and researchers working in information theory,
signal processing, machine learning, and statistics.
Miguel R. D. Rodrigues is Professor of Information Theory and Processing in the Depart-
ment of Electronic and Electrical Engineering, University College London, and a Turing
Fellow at the Alan Turing Institute, London.
Yonina C. Eldar is a Professor in the Faculty of Mathematics and Computer Science at
the Weizmann Institute of Science, a Fellow of the IEEE and Eurasip, and a member of
the Israel Academy of Sciences and Humanities. She is the author of Sampling Theory
(Cambridge, 2015), and co-editor of Compressed Sensing in Radar Signal Processing
(Cambridge, 2019), Compressed Sensing (Cambridge, 2012), and Convex Optimization
in Signal Processing and Communications (Cambridge, 2009).
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
“How should we model the opportunities created by access to data and to computing,
and how robust and resilient are these models? What information is needed for analysis,
how is it obtained and applied? If you are looking for answers to these questions, then
there is no better place to start than this book.”
Robert Calderbank, Duke University
“This pioneering book elucidates the emerging cross-disciplinary area of information-
theoretic data science. The book’s contributors are leading information theorists who
compellingly explain the close connections between the fields of information theory
and data science. Readers will learn about theoretical foundations for designing data
collection and analysis systems, from data acquisition to data compression, computation,
and learning.”
Al Hero, University of Michigan
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Information-Theoretic Methodsin Data Science
Edited by
MIGUEL R. D. RODRIGUESUniversity College London
YONINA C. ELDARWeizmann Institute of Science
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06-04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108427135
DOI: 10.1017/9781108616799
© Cambridge University Press 2021
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2021
Printed in the United Kingdom by TJ International Ltd, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-42713-5 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
To my husband Shalomi and children Yonatan, Moriah, Tal, Noa, and Roei for their
boundless love and for filling my life with endless happiness
YE
To my wife Eduarda and children Isabel and Diana for their unconditional love,
encouragement, and support
MR
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Contents
Preface page xiii
Notation xvii
List of Contributors xix
1 Introduction to Information Theory and Data Science 1
Miguel R. D. Rodrigues, Stark C. Draper, Waheed U. Bajwa, and Yonina C. Eldar
1.1 Classical Information Theory: A Primer 2
1.2 Source Coding: Near-Lossless Compression of Binary Sources 4
1.3 Channel Coding: Transmission over the Binary Symmetric Channel 8
1.4 Linear Channel Coding 12
1.5 Connecting Information Theory to Data Science 15
1.6 Information Theory and Data Acquisition 18
1.7 Information Theory and Data Representation 20
1.8 Information Theory and Data Analysis and Processing 26
1.9 Discussion and Conclusion 35
References 36
2 An Information-Theoretic Approach to Analog-to-Digital Compression 44
Alon Kipnis, Yonina C. Eldar, and Andrea J. Goldsmith
2.1 Introduction 44
2.2 Lossy Compression of Finite-Dimensional Signals 48
2.3 ADX for Continuous-Time Analog Signals 49
2.4 The Fundamental Distortion Limit 55
2.5 ADX under Uniform Sampling 63
2.6 Conclusion 69
References 70
3 Compressed Sensing via Compression Codes 72
Shirin Jalali and H. Vincent Poor
3.1 Compressed Sensing 72
3.2 Compression-Based Compressed Sensing 74
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
viii Contents
3.3 Definitions 76
3.4 Compressible Signal Pursuit 77
3.5 Compression-Based Gradient Descent (C-GD) 81
3.6 Stylized Applications 85
3.7 Extensions 92
3.8 Conclusions and Discussion 100
References 100
4 Information-Theoretic Bounds on Sketching 104
Mert Pilanci
4.1 Introduction 104
4.2 Types of Randomized Sketches 105
4.3 Background on Convex Analysis and Optimization 108
4.4 Sketching Upper Bounds for Regression Problems 110
4.5 Information-Theoretic Lower Bounds 114
4.6 Non-Parametric Problems 121
4.7 Extensions: Privacy and Communication Complexity 124
4.8 Numerical Experiments 126
4.9 Conclusion 127
A4.1 Proof of Theorem 4.3 128
A4.2 Proof of Lemma 4.3 129
References 130
5 Sample Complexity Bounds for Dictionary Learning from Vector-
and Tensor-Valued Data 134
Zahra Shakeri, Anand D. Sarwate, and Waheed U. Bajwa
5.1 Introduction 134
5.2 Dictionary Learning for Vector-Valued Data 137
5.3 Dictionary Learning for Tensors 149
5.4 Extensions and Open Problems 157
References 160
6 Uncertainty Relations and Sparse Signal Recovery 163
Erwin Riegler and Helmut Bölcskei
6.1 Introduction 163
6.2 Uncertainty Relations in (Cm,‖ · ‖2) 165
6.3 Uncertainty Relations in (Cm,‖ · ‖1) 173
6.4 Sparse Signal Separation 181
6.5 The Set-Theoretic Null-Space Property 184
6.6 A Large Sieve Inequality in (Cm,‖ · ‖2) 187
6.7 Uncertainty Relations in L1 and L2 189
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Contents ix
6.8 Proof of Theorem 6.4 189
6.9 Results for ||| · |||1 and ||| · |||2 192
References 193
7 Understanding Phase Transitions via Mutual Information and MMSE 197
Galen Reeves and Henry D. Pfister
7.1 Introduction 197
7.2 Problem Setup and Characterization 200
7.3 The Role of Mutual Information and MMSE 207
7.4 Proving the Replica-Symmetric Formula 215
7.5 Phase Transitions and Posterior Correlation 219
7.6 Conclusion 223
A7.1 Subset Response for the Standard Linear Model 223
References 225
8 Computing Choice: Learning Distributions over Permutations 229
Devavrat Shah
8.1 Background 229
8.2 Setup 232
8.3 Models 234
8.4 Sparse Model 236
8.5 Random Utility Model (RUM) 247
8.6 Discussion 255
References 258
9 Universal Clustering 263
Ravi Kiran Raman and Lav R. Varshney
9.1 Unsupervised Learning in Machine Learning 263
9.2 Universality in Information Theory 265
9.3 Clustering and Similarity 271
9.4 Distance-Based Clustering 276
9.5 Dependence-Based Clustering 280
9.6 Applications 291
9.7 Conclusion 292
References 293
10 Information-Theoretic Stability and Generalization 302
Maxim Raginsky, Alexander Rakhlin, and Aolin Xu
10.1 Introduction 302
10.2 Preliminaries 305
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
x Contents
10.3 Learning Algorithms and Stability 308
10.4 Information Stability and Generalization 311
10.5 Information-Theoretically Stable Learning Algorithms 317
References 327
11 Information Bottleneck and Representation Learning 330
Pablo Piantanida and Leonardo Rey Vega
11.1 Introduction and Overview 330
11.2 Representation and Statistical Learning 333
11.3 Information-Theoretic Principles and Information Bottleneck 339
11.4 The Interplay between Information and Generalization 347
11.5 Summary and Outlook 356
References 356
12 Fundamental Limits in Model Selection for Modern Data Analysis 359
Jie Ding, Yuhong Yang, and Vahid Tarokh
12.1 Introduction 359
12.2 Fundamental Limits in Model Selection 360
12.3 An Illustration on Fitting and the Optimal Model 363
12.4 The AIC, the BIC, and Other Related Criteria 365
12.5 The Bridge Criterion – Bridging the Conflicts between the AIC
and the BIC 369
12.6 Modeling-Procedure Selection 375
12.7 Conclusion 379
References 380
13 Statistical Problems with Planted Structures: Information-Theoretical
and Computational Limits 383
Yihong Wu and Jiaming Xu
13.1 Introduction 383
13.2 Basic Setup 384
13.3 Information-Theoretic Limits 386
13.4 Computational Limits 401
13.5 Discussion and Open Problems 412
A13.1 Mutual Information Characterization of Correlated Recovery 415
A13.2 Proof of (13.7)⇒ (13.6) and Verification of (13.7)
in the Binary Symmetric SBM 418
References 419
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Contents xi
14 Distributed Statistical Inference with Compressed Data 425
Wenwen Zhao and Lifeng Lai
14.1 Introduction 425
14.2 Basic Model 427
14.3 Interactive Communication 438
14.4 Identity Testing 445
14.5 Conclusion and Extensions 451
References 452
15 Network Functional Compression 455
Soheil Feizi and Muriel Médard
15.1 Introduction 455
15.2 Problem Setup and Prior Work 458
15.3 Network Functional Compression 463
15.4 Discussion and Future Work 473
15.5 Proofs 476
References 484
16 An Introductory Guide to Fano’s Inequality with Applications
in Statistical Estimation 487
Jonathan Scarlett and Volkan Cevher
16.1 Introduction 487
16.2 Fano’s Inequality and Its Variants 492
16.3 Mutual Information Bounds 495
16.4 Applications – Discrete Settings 499
16.5 From Discrete to Continuous 510
16.6 Applications – Continuous Settings 516
16.7 Discussion 523
References 525
Index 529
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Preface
Since its introduction in 1948, the field of information theory has proved instrumen-
tal in the analysis of problems pertaining to compressing, storing, and transmitting
data. For example, information theory has allowed analysis of the fundamental limits
of data communication and compression, and has shed light on practical communica-
tion system design for decades. Recent years have witnessed a renaissance in the use
of information-theoretic methods to address problems beyond data compression, data
communications, and networking, such as compressive sensing, data acquisition, data
analysis, machine learning, graph mining, community detection, privacy, and fairness.
In this book, we explore a broad set of problems on the interface of signal processing,
machine learning, learning theory, and statistics where tools and methodologies orig-
inating from information theory can provide similar benefits. The role of information
theory at this interface has indeed been recognized for decades. A prominent example is
the use of information-theoretic quantities such as mutual information, metric entropy
and capacity in establishing minimax rates of estimation back in the 1980s. Here we
intend to explore modern applications at this interface that are shaping data science in
the twenty-first century.
There are of course some notable differences between standard information-theoretic
tools and signal-processing or data analysis methods. Globally speaking, information
theory tends to focus on asymptotic limits, using large blocklengths, and assumes the
data is represented by a finite number of bits and viewed through a noisy channel.
The standard results are not concerned with complexity but focus more on fundamen-
tal limits characterized via achievability and converse results. On the other hand, some
signal-processing techniques, such as sampling theory, are focused on discrete-time rep-
resentations but do not necessarily assume the data is quantized or that there is noise in
the system. Signal processing is often concerned with concrete methods that are opti-
mal, namely, achieve the developed limits, and have bounded complexity. It is natural
therefore to combine these tools to address a broader set of problems and analysis which
allows for quantization, noise, finite samples, and complexity analysis.
This book is aimed at providing a survey of recent applications of information-
theoretic methods to emerging data-science problems. The potential reader of this book
could be a researcher in the areas of information theory, signal processing, machine
learning, statistics, applied mathematics, computer science or a related research area, or
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
xiv Preface
a graduate student seeking to learn about information theory and data science and to
scope out open problems at this interface. The particular design of this volume ensures
that it can serve as both a state-of-the-art reference for researchers and a textbook for
students.
The book contains 16 diverse chapters written by recognized leading experts world-
wide, covering a large variety of topics that lie on the interface of signal processing, data
science, and information theory. The book begins with an introduction to information
theory which serves as a background for the remaining chapters, and also sets the nota-
tion to be used throughout the book. The following chapters are then organized into four
categories: data acquisition (Chapters 2–4), data representation and analysis (Chapters
5–9), information theory and machine learning (Chapters 10 and 11), and information
theory, statistics, and compression (Chapters 12–15). The last chapter, Chapter 16, con-
nects several of the book’s themes via a survey of Fano’s inequality in a diverse range
of data-science problems. The chapters are self-contained, covering the most recent
research results in the respective topics, and can all be treated independently of each
other. A brief summary of each chapter is given next.
Chapter 1 by Rodrigues, Draper, Bajwa, and Eldar provides an introduction to infor-
mation theory concepts and serves two purposes: It provides background on classical
information theory, and presents a taster of modern information theory applied to
emerging data-science problems.
Chapter 2 by Kipnis, Eldar, and Goldsmith extends the notion of rate-distortion the-
ory to continuous-time inputs deriving bounds that characterize the minimal distortion
that can be achieved in representing a continuous-time signal by a series of bits when
the sampler is constrained to a given sampling rate. For an arbitrary stochastic input and
given a total bitrate budget, the authors consider the lowest sampling rate required to
sample the signal such that reconstruction of the signal from a bit-constrained represen-
tation of its samples results in minimal distortion. It turns out that often the signal can
be sampled at sub-Nyquist rates without increasing the distortion.
Chapter 3 by Jalali and Poor discusses the interplay between compressed sensing and
compression codes. In particular, the authors consider the use of compression codes to
design compressed sensing recovery algorithms. This allows the expansion of the class
of structures used by compressed sensing algorithms to those used by data compression
codes, which is a much richer class of inputs and relies on decades of developments in
the field of compression.
Chapter 4 by Pilanci develops information-theoretical lower bounds on sketching for
solving large statistical estimation and optimization problems. The term sketching is
used for randomized methods that aim to reduce data dimensionality in computationally
intensive tasks for gains in space, time, and communication complexity. These bounds
allow one to obtain interesting trade-offs between computation and accuracy and shed
light on a variety of existing methods.
Chapter 5 by Shakeri, Sarwate, and Bajwa treats the problem of dictionary learning,
which is a powerful signal-processing approach for data-driven extraction of features
from data. The chapter summarizes theoretical aspects of dictionary learning for vector-
and tensor-valued data and explores lower and upper bounds on the sample complexity
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Preface xv
of dictionary learning which are derived using information-theoretic tools. The depen-
dence of sample complexity on various parameters of the dictionary learning problem
is highlighted along with the potential advantages of taking the structure of tensor data
into consideration in representation learning.
Chapter 6 by Riegler and Bölcskei presents an overview of uncertainty relations for
sparse signal recovery starting from the work of Donoho and Stark. These relations are
then extended to richer data structures and bases, which leads to the recently discovered
set-theoretic uncertainty relations in terms of Minkowski dimension. The chapter also
explores the connection between uncertainty relations and the “large sieve,” a family
of inequalities developed in analytic number theory. It is finally shown how uncertainty
relations allow one to establish fundamental limits of practical signal recovery problems
such as inpainting, declipping, super-resolution, and denoising of signals.
Chapter 7 by Reeves and Pfister examines high-dimensional inference problems
through the lens of information theory. The chapter focuses on the standard linear model
for which the performance of optimal inference is studied using the replica method from
statistical physics. The chapter presents a tutorial of these techniques and presents a new
proof demonstrating their optimality in certain settings.
Chapter 8 by Shah discusses the question of learning distributions over permutations
of a given set of choices based on partial observations. This is central to capturing
choice in a variety of contexts such as understanding preferences of consumers over
a collection of products based on purchasing and browsing data in the setting of retail
and e-commerce. The chapter focuses on the learning task from marginal distributions
of two types, namely, first-order marginals and pair-wise comparisons, and provides a
comprehensive review of results in this area.
Chapter 9 by Raman and Varshney studies universal clustering, namely, clustering
without prior access to the statistical properties of the data. The chapter formalizes the
problem in information theory terms, focusing on two main subclasses of clustering that
are based on distance and dependence. A review of well-established clustering algo-
rithms, their statistical consistency, and their computational and sample complexities is
provided using fundamental information-theoretic principles.
Chapter 10 by Raginsky, Rakhlin, and Xu introduces information-theoretic measures
of algorithmic stability and uses them to upper-bound the generalization bias of learn-
ing algorithms. The notion of stability implies that its output does not depend too
much on any individual training example and therefore these results shed light on the
generalization ability of modern learning techniques.
Chapter 11 by Piantanida and Vega introduces the information bottleneck principle
and explores its use in representation learning, namely, in the development of com-
putational algorithms that learn the different explanatory factors of variation behind
high-dimensional data. Using these tools, the authors obtain an upper bound on the
generalization gap corresponding to the cross-entropy risk. This result provides an inter-
esting connection between mutual information and generalization, and helps to explain
why noise injection during training can improve the generalization ability of encoder
models.
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
xvi Preface
Chapter 12 by Ding, Yang, and Tarokh discusses fundamental limits of inference
and prediction based on model selection principles from modern data analysis. Using
information-theoretic tools the authors analyze several state-of-the-art model selec-
tion techniques and introduce two recent advances in model selection approaches, one
concerning a new information criterion and the other concerning modeling-procedure
selection.
Chapter 13 by Wu and Xu provides an exposition on some of the methods for deter-
mining the information-theoretical as well as computational limits for high-dimensional
statistical problems with a planted structure. Planted structures refer to a ground truth
structure (often of a combinatorial nature) which one is trying to discover in the presence
of random noise. In particular, the authors discuss first- and second-moment methods for
analyzing the maximum likelihood estimator, information-theoretic methods for proving
impossibility results using mutual information and rate-distortion theory, and techniques
originating from statistical physics. To investigate computational limits, they describe
randomized polynomial-time reduction schemes that approximately map planted-clique
problems to the problem of interest in total variation distance.
Chapter 14 by Zhao and Lai considers information-theoretic models for distributed
statistical inference problems with compressed data. The authors review several research
directions and challenges related to applying these models to various statistical learn-
ing problems. In these applications, data are distributed in multiple terminals, which
can communicate with each other via limited-capacity channels. Information-theoretic
tools are used to characterize the fundamental limits of the classical statistical inference
problems using compressed data directly.
Chapter 15 by Feizi and Médard treats different aspects of the network functional
compression problem. The goal is to compress a source of random variables for the
purpose of computing a deterministic function at the receiver where the sources and
receivers are nodes in a network. Traditional data compression schemes are special cases
of functional compression, in which the desired function is the identity function. It is
shown that for certain classes of functions considerable compression is possible in this
setting.
Chapter 16 by Scarlett and Cevher provides a survey of Fano’s inequality and its use
in various statistical estimation problems. In particular, the chapter overviews the use of
Fano’s inequality for establishing impossibility results, namely, conditions under which
a certain goal cannot be achieved by any estimation algorithm. The authors present
several general-purpose tools and analysis techniques, and provide representative exam-
ples covering group testing, graphical model selection, sparse linear regression, density
estimation, and convex optimization.
Within the chapters, the authors point to various open research directions at the
interface of information theory, data acquisition, data analysis, machine learning, and
statistics that will certainly see increasing attention in the years to come.
We would like to end by thanking all the authors for their contributions to this book
and for their hard work in presenting the material in a unified and accessible fashion.
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Notation
z scalar (or value of random variable Z)
Z random variable
z vector (or value of random vector Z)
Z matrix (or random vector)
zi ith entry of vector z
Zi, j (i, j)th entry of matrix Z
Zn = (Z1, . . . ,Zn) sequence of n random variables
zn = (z1, . . . ,zn) value of sequence of n random variables Zn
Zj
i=(
Zi, . . . ,Z j
)
sequence of j− i+1 random variables
zj
i=(
zi, . . . ,z j
)
value of sequence of j− i+1 random variables Zj
i
‖ · ‖p p-norm
(·)T transpose operator
(·)∗ conjugate Hermitian operator
(·)† pseudo-inverse of the matrix argument
tr(·) trace of the square matrix argument
det(·) determinant of the square matrix argument
rank(·) rank of the matrix argument
range(·) range span of the column vectors of the matrix argument
λmax(·) maximum eigenvalue of the square matrix argument
λmin(·) minimum eigenvalue of the square matrix argument
λi(·) ith largest eigenvalue of the square matrix argument
I identity matrix (its size is determined from the context)
0 matrix with zero entries (its size is determined from the context)
T standard notation for sets
|T | cardinality of set T
R set of real numbers
C set of complex numbers
Rn set of n-dimensional vectors of real numbers
Cn set of n-dimensional vectors of complex numbers
j imaginary unit
Re(x) real part of the complex number x
Im(x) imaginary part of the complex number x
|x| modulus of the complex number x
arg(x) argument of the complex number x
E[·] statistical expectation
P[·] probability measure
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
xviii Notation
H(·) entropy
H(·|·) conditional entropy
h(·) differential entropy
h(·|·) conditional differential entropy
D(·‖·) relative entropy
I(·; ·) mutual information
I(·; ·|·) conditional mutual information
N(
µ,σ2)
scalar Gaussian distribution with mean µ and variance σ2
N(µ,Σ) multivariate Gaussian distribution with mean µ and covariance
matrix Σ
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
Contributors
Waheed U. Bajwa
Department of Electrical and Computer
Engineering
Rutgers University
Helmut Bölcskei
Department of Information Technology
and Electrical Engineering
Department of Mathematics
ETH Zürich
Volkan Cevher
Laboratory for Information and
Inference Systems
Institute of Electrical Engineering
School of Engineering, EPFL
Jie Ding
School of Statistics
University of Minnesota
Stark C. Draper
Department of Electrical and Computer
Engineering
University of Toronto
Yonina C. Eldar
Faculty of Mathematics and Computer
Science
Weizmann Institute of Science
Soheil Feizi
Department of Computer Science
University of Maryland, College Park
Andrea J. Goldsmith
Department of Electrical Engineering
Stanford University
Shirin Jalali
Nokia Bell Labs
Alon Kipnis
Department of Statistics
Stanford University
Lifeng Lai
Department of Electrical and Computer
Engineering
University of California, Davis
Muriel Médard
Department of Electrical Engineering
and Computer Science
Massachussets Institute of
Technology
Henry D. Pfister
Department of Electrical and Computer
Engineering
Duke University
Pablo Piantanida
Laboratoire des Signaux et Systèmes
Université Paris Saclay
CNRS-CentraleSupélec
Montreal Institute for Learning
Algorithms (Mila), Université
de Montréal
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
xx List of Contributors
Mert Pilanci
Department of Electrical Engineering
Stanford University
H. Vincent Poor
Department of Electrical Engineering
Princeton University
Maxim Raginsky
Department of Electrical and Computer
Engineering
University of Illinois at
Urbana-Champaign
Alexander Rakhlin
Department of Brain & Cognitive
Sciences
Massachusetts Institute of Technology
Ravi Kiran Raman
Department of Electrical and Computer
Engineering
University of Illinois at
Urbana-Champaign
Galen Reeves
Department of Electrical and
Computer Engineering
Department of Statistical ScienceDuke University
Leonardo Rey Vega
Department of Electrical Engineering
School of Engineering
Universidad de Buenos Aires
CSC-CONICET
Erwin Riegler
Department of Information Technology
and Electrical Engineering
ETH Zürich
Miguel R. D. Rodrigues
Department of Electronic and Electrical
Engineering
University College London
Anand D. Sarwate
Department of Electrical and
Computer Engineering
Rutgers University
Jonathan Scarlett
Department of Computer Science and
Department of Mathematics
National University of Singapore
Devavrat Shah
Department of Electrical Engineering and
Computer Science
Massachusetts Institute of Technology
Zahra Shakeri
Electronic Arts
Vahid Tarokh
Department of Electrical and Computer
Engineering
Duke University
Lav R. Varshney
Department of Electrical and Computer
Engineering
University of Illinois at Urbana-Champaign
Yihong Wu
Department of Statistics and Data Science
Yale University
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press
List of Contributors xxi
Aolin Xu
Department of Electrical and Computer
Engineering
University of Illinois at
Urbana-Champaign
Jiaming Xu
Fuqua School of Business
Duke University
Yuhong Yang
School of Statistics
University of Minnesota
Wenwen Zhao
Department of Electrical and
Computer Engineering
University of California, Davis
Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information
www.cambridge.org© in this web service Cambridge University Press