DOCTOR OF PHILOSOPHY Information theoretic parameters for ... · Information theoretic parameters...

DOCTOR OF PHILOSOPHY

Information theoretic parameters for graphs and operator systems

Boreland, Gareth

Award date:2020

Awarding institution:Queen's University Belfast

Link to publication

Terms of useAll those accessing thesis content in Queen’s University Belfast Research Portal are subject to the following terms and conditions of use

• Copyright is subject to the Copyright, Designs and Patent Act 1988, or as modified by any successor legislation • Copyright and moral rights for thesis content are retained by the author and/or other copyright owners • A copy of a thesis may be downloaded for personal non-commercial research/study without the need for permission or charge • Distribution or reproduction of thesis content in any format is not permitted without the permission of the copyright holder • When citing this work, full bibliographic details should be supplied, including the author, title, awarding institution and date of thesis

Take down policyA thesis can be removed from the Research Portal if there has been a breach of copyright, or a similarly robust reason.If you believe this document breaches copyright, or there is sufficient cause to take down, please contact us, citing details. Email:[email protected]

Supplementary materialsWhere possible, we endeavour to provide supplementary materials to theses. This may include video, audio and other types of files. Weendeavour to capture all content and upload as part of the Pure record for each thesis.Note, it may not be possible in all instances to convert analogue formats to usable digital formats for some supplementary materials. Weexercise best efforts on our behalf and, in such instances, encourage the individual to consult the physical thesis for further information.

Download date: 24. Aug. 2020

https://pure.qub.ac.uk/en/theses/information-theoretic-parameters-for-graphs-and-operator-systems(185d8f67-ae5f-44bb-9b7f-68d54c29564e).html

Queen’s University Belfast

Doctoral Thesis

Information theoretic parameters for

graphs and operator systems.

Author:

Gareth Boreland

Supervisor:

Prof. Ivan Todorov

A thesis submitted for the degree of

Doctor of Philosophy

in the

Mathematical Sciences Research Centre,

School of Mathematics and Physics.

April 8, 2020

i

“For from him and through him and to him are all things.”

Romans 11:36.

Acknowledgements

It has been a great privilege for me, someone who is most certainly not an expert, to have had

the opportunity to learn from those who are. First and foremost I want to thank my super-

visor, Professor Ivan Todorov, for introducing me to this beautiful area of mathematics, and

for patiently guiding me through it. This thesis owes much to his expertise and enthusiasm.

I would also like to thank Professor Andreas Winter of Universitat Autonoma de Barcelona

for many helpful observations and numerous useful discussions on many different aspects of

this project. I must also thank Dr Peter Vrana of Budapest University of Technology and

Economics for many helpful comments.

To be part of the Mathematical Sciences Research Centre at Queen’s for the last four years

has been both a pleasure and a privilege, and my thanks go to all the staff, both academic

and non-academic, for making this experience so positive. I cannot mention every individual

here, but in particular I would like to thank Miss Sheila O’Brien for all her administration

work and Dr Ian Stewart for computer support.

Among the highlights of the course have been the fellowship and encouragement I have

enjoyed from my fellow PhD students, and I want to thank them all for their friendship over

these last years and to wish each of them well for the future.

My PhD studies were completed during a career-break from Sullivan Upper School, Holy-

wood. I want to thank all my friends and colleagues there, in particular Mr. Chris Peel, the

Headmaster, and the Board of Governors for facilitating this opportunity, and for allowing

me to return to school in 2020. I hope I can do so with renewed enthusiasm and energy.

I gratefully acknowledge receipt of the research studentship from the Department of Employ-

ment and Learning which has allowed me to complete this course.

ii

Abstract

This thesis explores connections between information theory and graph theory. One such

connection, the notion of graph entropy, was first introduced by Janos Korner in [23]. Here we

prove a number of seemingly new results on graph entropy, including the determination of the

graph entropy of the odd cycles and their complements under certain probability distributions.

We recall how convex corners in Rd provide a useful tool for describing graph theoretic and

information theoretic ideas, and we develop a theory of convex corners in Md appropriate

for applications in quantum information. We recall from [13] how non-commutative graphs

can be regarded as generalisations of graphs. With a given non-commutative graph we

associate a number of convex corners and prove a ‘quantum sandwich theorem’. We define

several new parameters for non-commutative graphs, and show them to be generalisations of

corresponding graph parameters. This includes two quantum versions of the Lovasz number,

one of which is seen to be an upper bound on Shannon capacity. Finally we return to examine

graph entropy in the case of a non-i.i.d. classical source, and generalise the Kolmogorov–Sinai

entropy of a dynamical system to this setting.

iii

Contents

Acknowledgements ii

Abstract iii

Symbols viii

Introduction xii

1 Entropic quantities for the classical i.i.d. source 1

1.1 Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Entropy over a convex corner in Rd . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Bounds on the entropy over a convex corner. . . . . . . . . . . . . . . 6

1.2.3 Anti-blockers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Graph entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.1 Some graph theoretic preliminaries . . . . . . . . . . . . . . . . . . . . 13

1.3.2 Graph entropy and the problem of source coding without complete

distinguishability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.3.3 A lower bound on graph entropy . . . . . . . . . . . . . . . . . . . . . 20

1.4 Convex corners associated with a graph . . . . . . . . . . . . . . . . . . . . . 30

iv

CONTENTS v

2 Convex corners in Md 33

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 Convex corners and anti-blockers in Md . . . . . . . . . . . . . . . . . . . . . 35

2.2.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . . . . 35

2.2.2 Examples of convex corners in Md . . . . . . . . . . . . . . . . . . . . 43

2.3 Reflexivity of Md-convex corners . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.3.1 The second anti-blocker theorem . . . . . . . . . . . . . . . . . . . . . 48

2.3.2 Consequences of reflexivity . . . . . . . . . . . . . . . . . . . . . . . . 55

2.4 Entropic quantities in quantum information theory . . . . . . . . . . . . . . . 58

2.4.1 Some background on quantum information . . . . . . . . . . . . . . . 58

2.4.2 Diagonal expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.4.3 Entropy over an Md-convex corner . . . . . . . . . . . . . . . . . . . . 69

3 Non-commutative graphs and associated convex corners 81

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.1.1 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.1.2 Quantum measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.1.3 Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.1.4 Non-commutative graphs . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.2 Convex corners from non-commutative graphs . . . . . . . . . . . . . . . . . . 96

3.2.1 The abelian, clique and full projection convex corners . . . . . . . . . 96

3.2.2 Embedding the classical in the quantum setting . . . . . . . . . . . . . 100

3.2.3 The Lovasz corner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.2.4 A quantum sandwich theorem . . . . . . . . . . . . . . . . . . . . . . . 120

CONTENTS vi

4 Parameters for non-commutative graphs 124

4.1 Parameters for non-commutative graphs from convex corners . . . . . . . . . 124

4.1.1 Defining non-commutative graph parameters . . . . . . . . . . . . . . 125

4.1.2 Properties of non-commutative graph parameters . . . . . . . . . . . . 131

4.1.3 Non-commutative graph homomorphisms . . . . . . . . . . . . . . . . 134

4.1.4 Weighted parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

4.2 Operator anti-systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.3 Non-commutative graph entropy . . . . . . . . . . . . . . . . . . . . . . . . . 141

4.4 Another quantum generalisation of θ(G) . . . . . . . . . . . . . . . . . . . . . 146

4.5 Capacity bounds, the Witsenhausen rate and other limits . . . . . . . . . . . 151

4.6 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

4.7 Further questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

5 The classical source with memory 174

5.1 Entropy and the source with memory . . . . . . . . . . . . . . . . . . . . . . 174

5.2 Graph theoretic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

5.3 Graph entropy for the source with memory . . . . . . . . . . . . . . . . . . . 186

5.3.1 The graph G[B] and its graph entropy . . . . . . . . . . . . . . . . . . 187

5.3.2 The quantity h(G[B], T ) and its properties . . . . . . . . . . . . . . . . 193

5.3.3 Generalising the Kolmogorov–Sinai Theorem . . . . . . . . . . . . . . 195

5.4 Further questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

5.4.1 Finite subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

5.4.2 Source coding with partial distinguishability . . . . . . . . . . . . . . . 204

5.4.3 Distinguishability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

5.4.4 Further generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

CONTENTS vii

A Convexity and semi-continuity 207

B Linear algebra 213

C Source coding for the ergodic source 218

Symbols

For reference, listed here are some of the main symbols used, with the page number of either

their definition or first occurrence. In some cases to avoid a lengthy description, the reader

is referred to the main text for the appropriate definition.

R+ set of non-negative real numbers, p. 1

H(p) Shannon entropy of p, p. 2

[n] 1, . . . , n, p. 1

X finite alphabet, p. 2

Pd set of probability distributions in Rd, p. 1

D(p‖q) relative entropy, p. 3

1 all ones vector, p. 1

Cd Rd-unit corner, p. 4

Bd Rd-unit cube, p. 4

HA(p) entropy of p ∈ Pd over convex corner A ⊆ Rd, p. 5

γ(A), A ⊆ Rd max〈u,1〉 : u ∈ A, p. 6

A[, A ⊆ Rd+ anti-blocker of A, p. 10

V (G) vertex set of G, p. 13

E(G) edge set of G, p. 13

G graph complement of G, p. 13

α(·) independence number, p. 13

ω(·) clique number, p. 13

χ(·) chromatic number, p. 14

ωf(·) fractional clique number, p. 18

χf(·) fractional chromatic number, p. 15

θ(·) Lovasz number, p. 31

Kn complete graph on n vertices, p. 14

viii

SYMBOLS ix

Kn empty graph on n vertices, p. 14

F ∗G co-normal product, p. 14

x ∼ y x is adjacent to y, p. 13

x ' y x is adjacent to or equal to y, p. 13

Gn nth co-normal power of G, p. 14

VP(G) vertex packing polytope of G, p. 16

FVP(G) fractional vertex packing polytope of G, p. 30

TH(G) Theta corner of G, p. 30

N(A), A ⊆ Rd maxβ : β1 ∈ A, p. 9

M(A), A ⊆ Rd see (1.15), p. 11

H(G,P ) graph entropy of (G, p), p. 16

〈·, ·〉 inner product, p. 6

Cn cycle graph on n vertices, p. 23

C(G) see (1.34), p. 31

Mm,n(S) set of m× n matrices with entries in S, p. 213

Mm,n set of m× n matrices with entries in C, p. 213

Md set of d× d matrices with entries in C, p. 33

Dd set of diagonal d× d matrices with entries in C, p. 33

Mhd set of Hermitian d× d matrices, p. 33

M+d set of positive semi-definite d× d matrices, p. 33

M++d set of positive definite d× d matrices, p. 33

D+d set of diagonal d× d positive semi-definite matrices, p. 33

Id d× d identity matrix, p. 213

Jd d× d all ones matrix, p. 213

γ(A), A ⊆Md maxTrA : A ∈ A, p. 57

TrA trace of A, p. 213

A∗, A ∈Mm,n Hermitian transpose of A, p. 213

dim(V ) dimension of V, p. 213

V ⊥ orthogonal complement of V, p. 214

‖A‖, A ∈Md operator norm of A, p. 214

‖A‖2, A ∈Md Hilbert–Schmidt norm of A, p. 214

ran(M) range of M, p. 214

ker(M) kernel of M, p. 214

SYMBOLS x

⊗ tensor product, p. 216

δij Kronecker delta, p. 216

conv convex hull, p. 207

conv closure of convex hull, p. 42

her hereditary cover, p. 41

C(A) her(conv(A)), p. 43

φ canonical mapping Rd+ → D+d , p. 35

A], A ⊆M+d anti-blocker of A, p. 36

Rd set of states in Md, p. 59

AX , X ∈Mhd M ∈M+

d : Tr(MX) ≤ 1), p. 44

BX , X ∈Mhd M ∈M+

d : M ≤ X), p. 44

N(A),A ⊆Md maxβ : βId ∈ A, p. 57

M(A),A ⊆Md see (2.22), p. 57

DV set of matrices diagonal in basis V, p. 62

∆V diagonal expectation with respect to basis V, p. 62

∆ diagonal expectation with respect to canonical basis, p. 63

H(ρ) von Neumann entropy, p. 60

HA(ρ) entropy of ρ over convex corner A, p. 70

GH strong product, p. 83

Gn nth strong power of G, p. 83

c(·) Shannon capacity, p. 83

R(·) Witsenhausen rate, p. 85

Φ∗ adjoint channel, p. 108

SG operator system associated to graph G, p. 93

SΦ operator system associated to channel Φ, p. 90

ap(S) abelian projection convex corner, p. 98

fp(S) full projection convex corner, p. 98

cp(S) clique projection convex corner, p. 98

th(S), thk(S) Lovasz corner, kth Lovasz corner, p. 109, p. 110

Pa,Pf ,Pc set of abelian, full, clique projections, p. 127, p. 121, p. 127

vp(G) diagonal convex corner corresponding to VP(G), p. 101

fvp(G) diagonal convex corner corresponding to FVP(G), p. 101

SYMBOLS xi

th(G) diagonal convex corner corresponding to TH(G), p. 108

P0(G) φ(C(G)), p. 108

P(G) see (3.45), p. 115

C(S),Ck(S) see (3.30), (3.33), p. 109, p. 110

P(S),Pk(S) see (3.32), (3.35), p. 109, p. 110

ω(S) full number, p. 126

Ωf(S) fractional clique covering number, p. 128

Ωf(S) fractional full covering number, p. 128

θ(S), θk(S) Lovasz number, kth Lovasz number, p. 126

Ω(S) clique covering number, p. 127

Ω(S) full covering number, p. 127

χs(T ) strong chromatic number, p. 138

Sd constant diagonal operator system, p. 165

ϑ(S) sup‖I + T‖ : T ∈Md, I + T ≥ 0, T ∈ S⊥, p. 146

ϑ(S) supd∈N ϑ(Md(S)), p. 146

λmin(A) smallest eigenvalue of A, p. 147

θ(S) second Lovasz number, p. 147

F ·G lexicographic graph product, p. 182

at(B) set of atoms of finite sub-algebra B, p. 176

A0 time-0 sub-algebra, p. 176

A ∨ B σ-algebra generated by A ∪ B, p. 176

H(B) entropy of finite sub-algebra B, p. 177

H(B|C) conditional entropy of B given C, p. 177

h(B, T ) entropy of B relative to T, p. 177

h(T ) Kolmogorov–Sinai entropy of T, p. 177

G[B] distinguishability graph on at(B), p. 187

F the σ-algebra∨∞n=−∞ T

nA0, p. 175

F0 the algebra⋃∞n=0

∨ni=−n T

iA0, p. 177

S set of finite sub-algebras of the form∨nk=1 T

−ikA0, p. 177

GZ0 infinite co-normal power of G0, p. 189

H(G[C|B], P ) conditional graph entropy of C given B, p. 189

h(G[B], T ) graph entropy of B relative to T, p. 193

h(G,T ) graph entropy of T, p. 195

Introduction

Connections between information theory and graph theory continue to grow, and this thesis

intends to explore just some of them. The relationship between the two fields has been

beneficial to both: from the beginnings of information theory in the work of Claude Shannon

([47], [48]) it was seen that some information theoretic ideas could naturally be expressed

in the language of graph theory, and many subsequent advances in information theory have

motivated new developments in graph theory. As an example of this mutual relationship

we need look no further than the problem of zero-error communication through the classical

channel. This problem was first discussed in [48] and is an important piece of background for

this thesis. Letting N denote a classical channel from finite alphabet X to finite alphabet Y,

we construct graph GN , known as the confusability graph for channel N , with vertex set X ,

and in which edges join elements of X which are confusable in the sense that their outputs

are not guaranteed to be distinct. For a single use of the channel, zero-error communication

is only possible if the sender is restricted to send only elements of X belonging to some

independent set in GN . The independence number of GN , denoted α(GN ), is then known as

the one-shot zero-error capacity of N . If the channel is used many times in succession, the

channel’s zero-error performance is described by the Shannon capacity of GN , given by

c(GN ) = limn→∞

n

√α(GnN ),

where Gn denotes the nth strong power of G. Since independence number is known to be

super-multiplicative over strong products, we have c(G) ≥ α(G); we note that equality does

hold here in some cases, for example for perfect graphs. However, the determination of the

Shannon capacity of a given graph is in general a notoriously difficult problem, and has led

to fruitful work in graph theory, often in finding bounds on c(G). The simplest graph for

which the determination of Shannon capacity proved to be problematic was C5, the cycle

graph on five vertices: in [48] Shannon left the calculation of c(C5) as an open problem. The

resulting story of mathematical discovery has given to C5 an important place in the history

xii

INTRODUCTION xiii

of both graph theory and information theory. Clearly α(C5) = 2, and it is not hard to see

that α(C25 ) = 5, so by super-multiplicativity c(C5) ≥

√5. A major breakthrough came in

Laszlo Lovasz’s paper [28] where he introduced the new graph parameter θ(G), now known

as the Lovasz number, and proved that c(G) ≤ θ(G) for any graph G. Lovasz showed that

θ(C5) ≤√

5, and thus established that c(C5) = θ(C5) =√

5. However, it does not hold for all

graphs G that c(G) = θ(G), and though θ(C2n+1) is known for all n ∈ N, the determination

even of c(C7) remains an open problem, and attracts active research interest [37].

It has been shown for any graph G that the ‘sandwich theorem’

α(G) ≤ θ(G) ≤ χf(G) (1)

holds, where χf(G) is the fractional chromatic number of G, a parameter which itself will be

seen to have information theoretic significance.

Discussion of the sandwich theorem leads naturally to a consideration of convex corners,

the concept of which will be an important unifying theme through much of this thesis. A

convex corner A ⊆ Rd is a set which is non-empty, closed, convex, non-negative (meaning

that A ⊆ Rd+) and with the hereditarity property that b ≤ a ∈ A ⇒ b ∈ A, where the

ordering is coordinate-wise [22]. Given a convex corner A ⊆ Rd, we define the functional

γ(A) = max

d∑i=1

vi : v ∈ A

,

and also a set A[, known as the anti-blocker of A and itself a convex corner, given by

A[ = v ∈ Rd+ : 〈v, u〉 ≤ 1 for all u ∈ A.

Given a graph G, the convex hull of the characteristic vectors of the independent sets of

G forms a convex corner VP(G), known as the vertex packing polytope of G, and α(G) =

γ(VP(G)). The fractional vertex packing polytope of graph G is the convex corner FVP(G) =

VP(G)[ and satisfies γ(FVP(G)) = χf(G). Significantly, [17] showed how to associate with

graph G a convex corner TH(G) satisfying both γ(TH(G)) = θ(G) and

VP(G) ⊆ TH(G) ⊆ FVP(G). (2)

This result also justifies being called a sandwich theorem, and it is clearly a stronger result

than (1), which is obtained from (2) by applying the functional γ. We note that a number

INTRODUCTION xiv

of important graph parameters with information theoretic significance can be expressed as

parameters of convex corners. One of the aims of this thesis is to explore further some of

these important links between graph theory, information theory and the theory of convex

corners.

A central concept in information theory is entropy. For probability distribution p =

(p1, . . . , pn) describing the outcomes of an experiment, Shannon introduced the function

H(p) = −n∑i=1

pi log pi

as a measure of the amount of a priori ‘uncertainty’ in the outcome of the event, or equiv-

alently as a measure of the amount of ‘information’ gained by subsequently learning the

outcome. A now famous story [52] tells us that John von Neumann suggested to Shannon

that he call this quantity ‘entropy’, firstly because the same function had already been used

in statistical mechanics with that name, and secondly because no one fully understood it,

giving Shannon an advantage in any resulting debate! Shannon’s source coding theorem given

in [47] gives further information theoretic significance to H(p). He considered the case that

a source emits a sequence (Xi)∞i=1 where the Xi’s are independent, identically distributed

(i.i.d.) random variables taking values in some finite alphabet, each following probability

distribution p. Shannon shows that H(p) may be regarded as the ‘mean number of encoding

bits required per source letter sent’ when we encode with negligible probability of error.

It is instructive to review a number of different ways in which Shannon’s entropy function

has been generalised.

1. Given a convex corner A ⊆ Rd and probability distribution p = (p1, . . . , pd), [11] defined

the quantity HA(p), the entropy of p over convex corner A, and showed that H(p) can

be written in this form.

2. If a source is not i.i.d., then successively emitted symbols from alphabet X are not

independent random variables, but rather follow some arbitrary joint distribution. In

this case it is required to consider a probability measure P on the sample space Ω = X Z

equipped with a canonically defined σ-algebra F and shift function T : Ω → Ω. An

analysis of such a source can be found for example in [4] and [20], and defines for each

finite sub-algebra A ⊂ F the quantity h(A, T ), known as the entropy of A relative to

INTRODUCTION xv

T . The Kolmogorov–Sinai entropy of the dynamical system (Ω,F , P, T ) is defined by

h(T ) = suph(A, T ) : A is a finite sub-algebra of F.

The Kolmogorov–Sinai entropy is the generalisation of H(p) to this setting, and can be

shown to be invariant under isomorphisms, that is under certain structure preserving

mappings between dynamical systems. The well-known Kolmogorov–Sinai Theorem

states that if finite sub-algebra B generates F in the sense that F =∨∞n=−∞ T

nB, then

h(T ) = h(B, T ).

3. The discovery of quantum mechanics has led to the development of the field of quantum

information, where data is transmitted not by elements of an alphabet, but by quantum

states. In this context, instead of probability distributions in Rd, we consider positive

semi-definite operators ρ ∈Mn with unit trace, known as states. Here the appropriate

generalisation of H(p) is the von Neumann entropy, given by

H(ρ) = −Tr(ρ log ρ).

4. In [23] Janos Korner considers the source coding problem in the case of an i.i.d. source

with probability distribution p on alphabet X whose symbols are not all distinguishable.

A graph G is constructed on vertex set X in which vertices are adjacent precisely when

they are distinguishable. In this setting Korner shows that the source coding problem

is solved by the quantity H(G, p), which he calls graph entropy. In [11] it is shown that

H(G, p) = HVP(G)(p),

where HVP(G)(p) is the entropy over the vertex packing polytope, giving another ex-

ample of the links between graph theory, information theory and convex corners.

Where our exploration leaves the classical case and enters the field of quantum infor-

mation, our starting point is [13], where the notion of the confusability graph of a classical

channel is generalised to the quantum channel. Central to this theory is the idea of an op-

erator system, namely a self-adjoint, unital subspace of Md. It is shown in [13] how, for

a given quantum channel Φ, we can construct an operator system SΦ, and also that every

operator system can be associated in this way with some quantum channel. Furthermore,

[13] shows how to construct from graph G an operator system SG from which G can be recov-

ered, and also how the classical channel can naturally be embedded in the operator system

INTRODUCTION xvi

approach. For these reasons operator systems are known as non-commutative graphs (we use

the terms interchangeably) and are regarded as ‘quantum’ generalisations of graphs; indeed

the same objects are sometimes, but not here, known as quantum graphs [55]. From there

[13] proceeds to define for an operator system S the independence number α(S) and the

Shannon capacity c(S), and in [35] a definition of the chromatic number χ(S) is given. These

parameters are noted to generalise the corresponding graph parameters in the sense that

α(SG) = α(G), c(SG) = c(G) and χ(SG) = χ(G) for all graphs G. For a non-commutative

graph S, [13] also defines the parameter ϑ(S) which is shown to be a generalisation of the

Lovasz number of a graph in that for any graph G

ϑ(SG) = θ(G).

The parameter ϑ(S) is also shown to be an upper bound on the Shannon capacity c(S).

Finding bounds on c(S) continues to be an important area of research [27]. We note that [21]

and [51] have both given quantum versions of (1), but these results have not been related to

convex corners as in (2).

From this brief summary of some existing links between information theory, graph theory

and the theory of convex corners, the motivation for this study is hopefully clear. Some of

the specific questions which we wish to pursue are the following:

• By analogy with the theory of convex corners in Rd, which we now call Rd-convex

corners, can we develop a theory of Md-convex corners, that is convex corners whose

elements lie in Md, and do they have quantum information theoretic significance?

• In particular, is there a quantum information theoretic version of (2)?

• What other parameters for non-commutative graphs can be defined that naturally gen-

eralise corresponding parameters of graphs?

• In the quantum case, can we develop a theory of the entropy of a state over anMd-convex

corner, and can a quantum version of graph entropy be defined for non-commutative

graphs?

• In the classical case, can the notion of graph entropy be extended for the non-i.i.d.

source?

The reader may find the following brief overview of the thesis to be useful.

INTRODUCTION xvii

Chapter 1 concerns the classical setting and begins with some background on entropy,

graph entropy and Rd-convex corners. We recall from [50] for a given graph G that

maxp∈P

H(G, p) = logχf(G),

where P is the set of probability distributions on V (G). Noting that H(G, p) = HVP(G)(p),

we prove the corresponding result for the entropy over any convex corner A ⊆ Rd, namely

that

maxp∈Pd

HA(p) = log γ(A[), (3)

where Pd denotes the set of probability distributions in Rd. On the quantity HA(p) we

establish the straightforward lower bound

HA(p) ≥ H(p)− log γ(A),

and we give a necessary and sufficient condition for equality. An immediate Corollary of this

result is that for a graph G with probability distribution p on its vertex set,

H(G, p) ≥ H(p)− logα(G). (4)

From (4) we are able to calculate the graph entropy of the odd cycles and their complements

under certain classes of probability distributions, along with a number of other new results.

In Chapter 2 we introduce the concept of a convex corner in Md, a subset of Md that

is non-empty, closed, convex, non-negative and hereditary. Motivated by the Rd case, for

Md-convex corner B ⊆Md we define

γ(B) = max TrB : B ∈ B ,

and the anti-blocker B] by

B] = M ∈M+d : 〈M,B〉 ≤ 1 for all B ∈ B.

We introduce HB(ρ), the entropy of state ρ ∈Md over Md-convex corner B, and show that

maxHB(ρ) : ρ ∈Md, ρ is a state = log γ(B]).

INTRODUCTION xviii

For a convex corner A ⊆ Rd it is known that

(A[)[ = A,

the so-called second anti-blocker theorem [22]. A significant section of Chapter 2 is devoted

to proving the corresponding result for an Md-convex corner B, namely that

(B])] = B.

This chapter also introduces the diagonal expectation operator ∆ which will be used later to

examine relationships between convex corners in Rd and Md.

After reviewing the theory of the classical channel and its associated confusability graph,

Chapter 3 examines the quantum channel and gives the necessary background from [13] on

non-commutative graphs. The main focus of this chapter is to develop a generalisation of

(2) in the quantum case. For a non-commutative graph S we define the Md-convex corners

ap(S), cp(S), fp(S) and th(S), known respectively as the abelian projection convex corner,

clique projection convex corner, full projection convex corner and Lovasz corner of S. It is

shown that ap(S) is a generalisation of VP(G) in the sense that VP(G) is related in a natural

way to ap(SG). Similarly, cp(S)] and fp(S)] are quantum generalisations of FVP(G), as is

th(S) of TH(G). The chapter concludes by proving that for any non-commutative graph S

the following ‘quantum sandwich theorem’ holds:

ap(S) ⊆ th(S) ⊆ fp(S)]. (5)

Just as many graph parameters can be defined in terms of Rd-convex corners such as

VP(G),TH(G) and FVP(G), Chapter 4 shows how to define parameters for a given non-

commutative graph S ⊆ Md in terms of the Md-convex corners we associated with S in

Chapter 3. For instance, the independence number α(S) as given in [13] can be defined by

α(S) = γ(ap(S)), and we also show how the chromatic number χ(S) as given in [35] can

be written as a functional defined on ap(S). We use functionals on convex corners to define

a number of new parameters for non-commutative graphs. These include the fractional

chromatic number χf(S), the fractional clique number ωf(S), the clique covering number

Ω(S) and the fractional clique covering number Ωf(S). Each is shown to generalise the

corresponding graph parameter in the sense that when evaluated for the operator system

SG, the corresponding graph parameter of graph G is obtained. We know that the result

INTRODUCTION xix

ωf(G) = χf(G) holds for all graphs, and we show for a non-commutative graph S that

ωf(S) = χf(S). By analogy with graph entropy, we define the non-commutative graph entropy

of operator system S ⊆Md and state ρ ∈Md by

H(S, ρ) = Hap(S)(ρ),

and we obtain the result

maxH(S, ρ) : ρ ∈Md, ρ is a state = logχf(S),

showing the significance of the fractional chromatic number. From (5) we immediately have

that γ(ap(S)) ≤ γ(th(S)) ≤ γ(fp(S)]). The quantity γ(fp(S)]) will be known as the fractional

full covering number of S and denoted by Ωf(S). We call γ(th(S)) the Lovasz number of S,

and denote it by θ(S); this definition is justified by the fact that θ(SG) = θ(G) when G is a

graph. This leads to the result

α(S) ≤ θ(S) ≤ Ωf(S),

a quantum generalisation of (1). We note in general that θ(S) 6= ϑ(S). Furthermore, it

is not known whether θ is sub-multiplicative over tensor products, and consequently we do

not know if θ(S) upper bounds c(S). A number of equivalent expressions for θ(G) are given

in [28], and one of these motivates the definition of θ(S), another non-commutative graph

parameter. It satisfies θ(SG) = θ(G) and crucially is shown to be sub-multiplicative and thus

to be an upper bound on c(S). We also show how the classical concept of the Witsenhausen

rate of a graph can be generalised for non-commutative graphs.

Chapter 5 is a return to the classical setting and attempts to extend the concept of graph

entropy to the case of the ‘non-i.i.d.’ source. We begin with the dynamical system (Ω,F , P, T )

and construct an infinite graph G with vertex set Ω to describe a distinguishability relation

on Ω. For a finite sub-algebra B ⊂ F we form a finite graph G[B], and then define a quantity

h(G[B], T ), the graph entropy of B relative to T . The graph entropy of the resulting system

is then defined by

h(G,T ) = suph(G[B], T ),

where the supremum is taken over a certain class of finite sub-algebras, and is shown to be

invariant under a notion of isomorphism which we will specify. In this setting the analysis of

the Bernoulli shift reduces, as we would expect, to the problem addressed by Korner in [23],

INTRODUCTION xx

and we also consider the Markov shift. The problem of generalising the Kolmogorov–Sinai

Theorem to this context is discussed.

Necessary results and concepts from information theory, both classical and quantum, will

be developed from scratch, but a basic knowledge of linear algebra, probability theory and

analysis will be assumed. Some relevant background material is included in appendices A

and B. Much of the work in Chapter 1 on graph entropy has previously been published in

[5]. Chapters 3 to 4 contain joint work which has been published in [6].

Chapter 1

Entropic quantities for the classical

i.i.d. source

In 1948 Claude Shannon laid the foundations of information theory in his landmark paper [47].

A central concept of [47], and also of this chapter, is entropy. Some relevant preliminaries will

be introduced as needed, but the reader can find a review of necessary background material

on convexity and semi-continuity in Appendix A.

Entropic quantities typically involve the logarithm function, and unless stated otherwise

it will be assumed throughout that all logarithms are binary. The following conventions will

be used: log 0 = −∞, 0 log 0 = 0, and

pi logpiqi

=

0 if pi = 0

∞ if pi > 0 and qi = 0.

The logarithm function is strictly concave, a fact upon which many fundamental information

theoretic results rely.

We use the notation [d] := 1, . . . , d. We will denote the vector v ∈ Rd with ith coordinate

vi by (vi)i∈[d], or just (vi) where context allows, and we write v > 0 when vi > 0 for all

i ∈ [d]. Similarly, by v ≥ 0 we mean vi ≥ 0 for all i ∈ [d]. If v − w ≥ 0 for v, w ∈ Rd, we

write v ≥ w. The set of all probability distributions in Rd will be denoted by Pd. We write

Rd+ = v ∈ Rd : v ≥ 0 for the set of non-negative elements in Rd. Let 1 ∈ Rd the all ones

vector.

The next well-known result, known as the ‘log-sum’ lemma, follows from the convexity of

1

1.1 Shannon entropy 2

the function x→ x log x, and is a useful property of the logarithm function.

Lemma 1.0.1. ([10, Theorem 2.7.1].) When pi ≥ 0 and vi ≥ 0, i = 1, . . . , n, then

n∑i=1

pi log

(pivi

)≥

(n∑i=1

pi

)log

(∑ni=1 pi∑ni=1 vi

).

Equality holds if and only if there exists λ ∈ R such that vipi

= λ for all i = 1, . . . , n.

This chapter begins by recalling the definition of Shannon entropy and its connection to

the source coding theorem for the noiseless channel. We also recall the concept of a convex

corner in Rd, and then discuss the notion of entropy over a convex corner, as introduced in

[11]. We prove new upper and lower bounds on the entropy of a probability distribution over

a convex corner. We then discuss the vertex packing polytope, a widely studied convex corner

associated to a given graph, before proving some new results on graph entropy, a quantity

introduced by Korner in [23] and given by the entropy over the vertex packing polytope

of a given graph. We discuss two more convex corners associated to a graph, namely the

fractional vertex packing polytope and the theta corner, and we recall how the latter leads

to a definition of the Lovasz number of a graph as given in [28].

1.1 Shannon entropy

Throughout this thesis we will discuss a number of entropic quantities, but the most funda-

mental is the well-known Shannon entropy.

Definition 1.1.1. Given p = (p1, . . . , pd) ∈ Pd, the Shannon entropy (or just entropy), H(p),

is given by

H(p) = −d∑i=1

pi log pi.

Shannon’s source coding theorem for the noiseless channel is proved in [47] and attaches

an information theoretic meaning to H(p) which we now briefly discuss. Consider a finite

alphabet X = 1, . . . , d. A source emits a sequence (Xi)∞i=1 where the Xi’s are independent,

identically distributed (i.i.d.) random variables taking values in X and each following proba-

bility distribution p ∈ Pd. Such a source is referred to as discrete, memoryless and stationary.

We equip X k with the product probability distribution pk where pk(x1, . . . , xk) =∏ki=1 pxi .

A k-to-m binary code is an injection X k → 0, 1m, encoding ‘words’ of length k as binary

1.2 Entropy over a convex corner in Rd 3

strings of length m. Shannon’s source coding theorem states that for all λ ∈ (0, 1),

H(p) = limk→∞

1

klog(

min|E| : E ⊆ X k, pk(E) ≥ 1− λ

). (1.1)

This means H(p) can be intuitively understood to be the ‘long-term mean number of encoding

bits required per source letter’ with negligible probability of error. We explain this idea more

fully in Section 1.3.2, where we discuss a generalisation of this problem.

For p, q ∈ Pd, the relative entropy D(p‖q) is given by

D(p‖q) =

d∑i=1

pi logpiqi.

By Lemma 1.0.1, it holds that D(p‖q) ≥ 0, with equality if and only if p = q (see [10], [20].)

The relative entropy can be rewritten in terms of the Shannon entropy H(p) as

D(p‖q) = −H(p)−∑i∈X

pi log qi ≥ 0. (1.2)

We denote by u = (ui) ∈ Pd the uniform distribution given by ui = 1/d, i = 1, . . . , d. Putting

q = u in (1.2) gives that

0 ≤ H(p) ≤ H(u) = log d (1.3)

for all probability distributions p ∈ Pd. Also note from (1.2) that H(p) ≤ −∑

i∈X pi log qi for

all q ∈ Pd. Setting q = p gives equality, and thus,

H(p) = minq∈Pn

−∑i∈X

pi log qi

. (1.4)

This expression for H(p), equivalent to Definition 1.1.1, motivates the introduction of more

general entropic quantities, of which H(p) is a special case. This approach was first used in

[11], and is outlined in the next section.

1.2 Entropy over a convex corner in Rd

1.2.1 Definitions

Here we recall the definition and some basic properties of the new concept of entropy intro-

duced in [11]. We begin with some definitions.


We define Cd, the ‘Rd-unit corner’ by letting

Cd =

v ∈ Rd : v ≥ 0,

d∑i=1

vi ≤ 1

,

and Bd, the ‘Rd-unit cube’, by letting

Bd = v ∈ Rd : 0 ≤ vi ≤ 1 for all i ∈ [d].

If a set A satisfies A ⊆ Rd+, then we say A is non-negative. We say set A ⊆ Rd+ is hereditary

if w ∈ A for all w ∈ Rd+ satisfying w ≤ v for some v ∈ A. Fundamental to the approach given

in [11] is the concept of a convex corner.

Definition 1.2.1. ([11], [22].) A set A ⊆ Rd will be called an Rd-convex corner (or just

a convex corner where context allows) if it is closed, convex, non-empty, non-negative and

hereditary. If in addition the convex corner A is bounded and has non-empty interior, then

A will be called a standard convex corner.

Note that bounded convex corners are compact sets.

Lemma 1.2.2. Let A be an Rd-convex corner. The following are equivalent:

(i) A has a non-empty interior;

(ii) there exists r > 0 such that r1 ∈ A;

(iii) A contains a strictly positive element.

Proof. (i)⇒(ii) Suppose that A has non-empty interior. Writing B(a, δ) for the open ball with

centre a and radius δ, let a ∈ A and δ > 0 be such that B(a, δ) ⊆ A. Then a + 12√dδ1 ∈ A

and, since A is hereditary and a ≥ 0, we have 12√dδ1 ∈ A.

(ii)⇒(iii) is trivial.

(iii)⇒(i) Let b ∈ A be strictly positive. Setting r = mini∈[d]bi, it holds that c ≤ b

for all c ∈ B(0, r). By the hereditarity of A it follows that Rd+ ∩ B(0, r) ⊆ A. Now let

B0 = B(

r2√d1, r

2√d

). It is easy to see that B0 ⊆ Rd+. For all x ∈ B0, by the triangle

inequality

‖x‖ ≤∥∥∥∥x− r

2√d1

∥∥∥∥+

∥∥∥∥ r

2√d1

∥∥∥∥ ≤ r

2√d

+r

2≤ r.

Then B0 ⊆ Rd+ ∩ B(0, r) ⊆ A, and A has non-empty interior.


It is clear that Cd and Bd are standard Rd-convex corners. The following result is given

in [11]. For completeness, a full proof is supplied here.

Lemma 1.2.3. For any probability distribution p ∈ Pd and bounded convex corner A ⊆ Rd+,

the function f : A → R ∪ ∞ where f(v) = −∑d

i=1 pi log vi attains a minimum value f(a)

for some a ∈ A. If f(v) <∞ for some v ∈ A (in other words, the minimum is finite) and if

pi > 0, then the coordinate ai of the minimising vector a is uniquely determined.

Proof. For a convex corner A, let

A0(p) = v ∈ A : ∀i ∈ [d], pi > 0⇒ vi > 0.

Observe that f(v) is finite for all v ∈ A0(p), and that the restriction of f to A0(p) is

continuous. However, f(v) = ∞ for all v ∈ A\A0(p). It is clear that lima→v f(a) = ∞ for

all v ∈ A\A0(p). This implies that f is lower semi-continuous on A. Since A is bounded, it

follows that A is compact. The first assertion then follows from Theorem A.0.6.

Now for a ∈ A such that f(a) = minv∈A f(v) < ∞, it is required to show that ai is

uniquely determined when pi > 0. Consider b ∈ A with f(b) = f(a). Then a+b2 ∈ A by

convexity and f(a+b2 ) = −

∑di=1 pi log(ai+bi2 ). The log function is strictly concave, and so for

i = 1, . . . , d, we have

− pi log

(ai + bi

2

)≤ −pi

2log ai −

pi2

log bi <∞, (1.5)

with equality only if pi = 0 or ai = bi. However, since f(a) = f(b) = minv∈A f(v), it follows

that f(a+b2 ) ≥ f(a). This implies for each i ∈ [d] that the first inequality in (1.5) is an

equality, and so for each i = 1, . . . , d we have pi = 0 or ai = bi.

Motivated by Lemma 1.2.3, we now recall the definition of the entropy over a convex

corner, as found in [11].

Definition 1.2.4. The entropy of probability distribution p ∈ Pd over the bounded Rd-

convex corner A is given by

HA(p) = mina∈A

(−

d∑i=1

pi log ai

). (1.6)

Remark 1.2.5. Using the notation of Lemma 1.2.3, we see that if A0(p) = ∅, then f(v) =∞

for all v ∈ A and so HA(p) = ∞. As a trivial example, it is clear that H0(p) = ∞ for all


p ∈ Pd. Indeed, if A has empty interior, then there exists i ∈ [d] such that vi = 0 for all

v ∈ A. Then for p ∈ Pd with pi > 0, it holds that A0(p) = ∅ and thus HA(p) =∞. Whereas,

if A is bounded and has non-empty interior, then A0(p) 6= ∅ and HA(p) is finite for all p ∈ Pd.

If a ≤ b with a, b ∈ Rd, then −∑d

i=1 pi log ai ≥ −∑d

i=1 pi log bi. Thus the minimising

vector in (1.6) can be chosen as a vector that cannot be majorised coordinate-wise in A.

In the case where A = Cd, the minimising vector a in (1.6) can then be chosen to satisfy∑di=1 ai = 1. Then by this fact and (1.4),

HCd(p) = mina∈Pd

(−

d∑i=1

pi log ai

)= H(p). (1.7)

This shows, as [11] notes, that Shannon entropy is a special case of this more general entropy

concept.

Since 1 ∈ Bd, it is immediate that HBd(p) = 0 for all p ∈ Pd. By Definition 1.2.4, if A

and B are bounded convex corners with A ⊆ B, then HB(p) ≤ HA(p) for all p ∈ Pd. The

following lemma is an immediate consequence of these observations.

Lemma 1.2.6. If a convex corner A satisfies Cd ⊆ A ⊆ Bd, then

0 ≤ HA(p) ≤ H(p) for all p ∈ Pd.

1.2.2 Bounds on the entropy over a convex corner.

For a general convex corner A and a probability distribution p, a direct calculation of HA(p)

is likely to be difficult. Finding upper and lower bounds, as in Lemma 1.2.6, will often be a

useful way to proceed. In this subsection we state and prove Theorem 1.2.9, a lower bound

on HA(p), and Theorem 1.2.13, an upper bound on HA(p). (We will note that [24, (14)] is in

fact a special case of Theorem 1.2.9, and [50, Lemma 4] is a special case of Theorem 1.2.13.)

For u, v ∈ Rd we use the inner product given by 〈u, v〉 =∑d

i=1 uivi and the associated

norm ‖u‖ =√〈u, u〉. We note that

∑di=1 ui = 〈u,1〉 .

Definition 1.2.7. For a bounded Rd-convex corner A, we define

γ(A) = max〈u,1〉 : u ∈ A. (1.8)

It is clear that γ(A) = 0 if and only if A = 0. If Rd-convex corner A is unbounded, we

define γ(A) =∞.


Remark 1.2.8. The existence of the maximum in (1.8) follows from the extreme value theorem,

recalling that a bounded convex corner is compact and the function u→ 〈u,1〉 is continuous.

Lemma 1.0.1 yields the following lower bound on HA(p) in terms of the Shannon entropy

H(p) and the quantity γ(A).

Theorem 1.2.9. Let A be a bounded Rd-convex corner. For all p ∈ Pd

HA(p) ≥ H(p)− log γ(A).

Equality holds if and only if γ(A)p ∈ A. In the case of equality, v = γ(A)p is the unique

vector in A satisfying∑d

i=1−pi log vi = HA(p).

Proof. Using Lemma 1.2.3, choose u ∈ A satisfying HA(p) = −∑d

i=1 pi log ui. Lemma 1.0.1

and the definition of γ(A) give

d∑i=1

pi log

(piui

)≥ − log

(d∑i=1

ui

)≥ − log γ(A). (1.9)

The left hand side is equal to HA(p)−H(p), whence it is immediate that

HA(p) ≥ H(p)− log γ(A). (1.10)

By Lemma 1.0.1, equality in the first inequality of (1.9) requires ui = λpi with λ ∈ R for

all i = 1, . . . , d, and equality in the second inequality of (1.9) requires∑d

i=1 ui = γ(A). Thus,

equality in (1.10) implies that there exists u ∈ A such that γ(A) =∑d

i=1 ui = λ∑d

i=1 pi = λ.

This gives u = λp = γ(A)p ∈ A.

Conversely, if γ(A)p ∈ A, then

HA(p) = minv∈A

(−

d∑i=1

pi log vi

)≤−

d∑i=1

pi log(γ(A)pi)

=H(p)− log γ(A).

With (1.10), this gives HA(p) = H(p)− log γ(A).

To show that in the case of equality, v = γ(A)p is the unique vector in A satisfying


−∑d

i=1 pi log vi = HA(p), suppose there exists w ∈ A where

−d∑i=1

pi logwi = HA(p) = H(p)− log γ(A).

Then setting u = w gives equality throughout (1.9) and, as previously, w = γ(A)p.

Remark 1.2.10. It is easy to see that for any bounded convex corner A ⊆ Rd, the lower

bound in Theorem 1.2.9 is achieved for some p ∈ Pd. Let a ∈ A satisfy∑d

i=1 ai = γ(A).

Then p = 1γ(A)a ∈ Pd trivially satisfies the equality condition of Theorem 1.2.9.

We now work towards an upper bound on HA(p). Although the following lemma is intu-

itive, its proof is complicated by the fact that if pi = 0 for some i ∈ [d], then the minimising

vector v ∈ A in Definition 1.2.4 can have vi = 0, and for such a vector v the function

h : Pd → R ∪ ∞ given by h(p) = −∑d

i=1 pi log vi can take the value ∞.

Lemma 1.2.11. If A is a standard Rd-convex corner, then the function f : Pd → R given

by f(p) = HA(p) is upper semi-continuous. Furthermore, the function f is continuous at all

p ∈ Pd satisfying p > 0.

Proof. By Remark 1.2.5, f(p) is finite for all p ∈ Pd. Note by Lemma 1.2.2 that 1h1 ∈ A

for some h > 0. Let (p(m))m∈N be a sequence in Pd converging to p ∈ Pd. For p, u ∈ Rd+,

let g(p, u) =∑d

i=1−pi log ui. Denote the minimising vectors in the definitions of HA(p) and

HA(p(m)) by v ∈ A and v(m) ∈ A respectively; that is, HA(p) = g(p, v) and HA(p(m)) =

g(p(m), v(m)). If pi = 0 it can hold that vi = 0, so we form w = (1 − µ)v + µh1 > 0 where

1h1 ∈ A and µ ∈ (0, 1). By convexity, w ∈ A. We have

g(p, v) = HA(p) ≤ g(p, w) =d∑i=1

−pi log(

(1− µ)vi +µ

h

).

Noting that pi = 0 when vi = 0, we see that g(p, w)→µ→0 g(p, v). Then for any δ > 0, there

exists µ ∈ (0, 1) such that g(p, v) ≤ g(p, w) ≤ g(p, v) + δ. For all m we have HA(p(m)) =

g(p(m), v(m)) ≤ g(p(m), w). Then because p(m) →m→∞ p and w > 0,

lim supm→∞

HA(p(m)) ≤ limm→∞

g(p(m), w) = g(p, w).

Finally, since δ > 0 was arbitrary and g(p, w) ≤ g(p, v) + δ = HA(p) + δ, it follows that

lim supm→∞

f(p(m)) = lim supm→∞

HA(p(m)) ≤ HA(p) = f(p),


as required.

We now show further that the function f is continuous at all p ∈ Pn satisfying p > 0. To

see this, observe for all m ∈ N that

HA(p) = g(p, v) ≤ g(p, v(m)) = g(p− p(m), v(m)) + g(p(m), v(m)) (1.11)

= g(p− p(m), v(m)) +HA(p(m)).

Since A is bounded, we have for all m ∈ N that v(m) ≤ t1 for some t > 0. Now 1h1 ∈ A for

some h > 0, and so HA(p(m)) =∑d

i=1−p(m)i log v

(m)i ≤ log h. Then

−p(m)j log v

(m)j ≤ log h+

∑i 6=j

p(m)i log v

(m)i ≤ log h+ log t = log(th).

This gives log v(m)j ≥ − log(th)

p(m)j

. Since p(m) →m→∞ p > 0, for sufficiently large m and for all

j, we have v(m)j ≥ s for some s > 0. In other words, there exist s, t ∈ (0,∞) such that

s1 ≤ v(m) ≤ t1 for all sufficiently large m. Then since p(m) → p, we have limm→∞ g(p −

p(m), v(m)) = 0. It is now clear in (1.11) that we have HA(p) ≤ lim infm→∞HA(p(m)).

Corollary 1.2.12. The function f defined in Lemma 1.2.11 attains a finite maximum value

on Pd.

Proof. Since Pd is compact and f is upper semi-continuous, this follows from Lemma 1.2.11

by Theorem A.0.6.

This shows the existence of maxp∈Pd HA(p) for a standard Rd-convex corner A. After the

following definition, we will use the ‘minimax’ theorem as given in Theorem A.0.8 to evaluate

this upper bound on HA(p).

If A 6= Rd+ is a convex corner, then β ∈ R+ : β1 ∈ A is bounded and we define

N(A) = maxβ ∈ R+ : β1 ∈ A. (1.12)

The maximum exists by the fact that A is closed. Note that N(A) = 0 if and only if A has

empty interior. We write N(Rd+) =∞.

Theorem 1.2.13. Let A be a standard Rd-convex corner. Then

maxp∈Pd

HA(p) = − logN(A).


Proof. We have

maxp∈Pd

HA(p) = maxp∈Pd

minv∈A

d∑i=1

−pi log vi = supp∈Pd

infv∈A

d∑i=1

−pi log vi.

Note that Pd and A are compact and convex subsets of finite dimensional spaces.

Consider the function f : Pd × A → R ∪ ∞ given by f(p, v) =∑d

i=1−pi log vi. We

recall from the proof of Lemma 1.2.3 that the function v → f(p, v) is lower semi-continuous

and it is also clear that it is convex. The function p→ f(p, v) is linear and thus concave for

a fixed v ∈ A. Thus the conditions for applying Theorem A.0.8 are met, and interchange of

the supremum and infimum yields

maxp∈Pd

HA(p) = infv∈A

supp∈Pd

d∑i=1

−pi log vi

= infv∈A

log1

mini∈[d] vi= log

1

supv∈Amini∈[d] vi. (1.13)

Let m = supv∈Amini∈[d] vi. Recalling that N(A)1 ∈ A, it is clear that m ≥ N(A).

Conversely, for all ε > 0, there exists v ∈ A such that m − ε < mini∈[d] vi and hence

(m− ε)1 < v ∈ A. By hereditarity it follows that (m− ε)1 ∈ A. Thus N(A) ≥ m− ε for all

ε > 0 and so N(A) ≥ m. Thus m = N(A) and putting this in (1.13) completes the proof.

1.2.3 Anti-blockers

Here we recall the definition of the anti-blocker of a subset of Rd, and then survey some as-

sociated properties. We work towards Corollary 1.2.21, a result giving equivalent expressions

for the upper bound determined in Theorem 1.2.13.

Definition 1.2.14. ([22].) For A ⊆ Rd+, we define A[, the anti-blocker of A, by

A[ = v ∈ Rd+ : 〈v, u〉 ≤ 1 for all u ∈ A.

It is well known (see [11] and [22]) and easy to verify that A[ is a convex corner and that

A ⊆ B ⇒ A[ ⊇ B[. (1.14)

We will refer to (A[)[ as the as the second anti-blocker of A, and we will write (A[)[ = A[[.

The following important result will then be known as the ‘second anti-blocker theorem’.


Theorem 1.2.15. ([22, Section 30].) If A is a convex corner, then A[[ = A.

Lemma 1.2.16. For k > 0, the unit corner Cd ⊆ Rd and unit cube Bd ⊆ Rd satisfy

(kBd)[ =1

kCd and (kCd)[ =

1

kBd.

Proof. Since k1 ∈ kBd, it follows that if u ∈ (kBd)[, then 〈u, k1〉 = k∑d

i=1 ui ≤ 1 and

u ∈ 1kCd. Conversely, if u ∈ 1

kCd, then∑d

i=1 ui ≤1k . Then for all v ∈ kBd it follows that

〈u, v〉 ≤ 〈u, k1〉 = k∑n

i=1 ui ≤ 1 and u ∈ (kBd)[. Thus, (kBd)[ = 1kCd. Anti-blocking both

sides and applying Theorem 1.2.15 yields kBd = ( 1kCd)

[, and the second assertion follows.

Lemma 1.2.17. It holds that A is a standard convex corner if and only if A[ is a standard

convex corner.

Proof. Assume A is a standard convex corner. Then A has non-empty interior and by Lemma

1.2.2, s1 ∈ A for some s > 0. Thus sBd ⊆ A by hereditarity. Since A is bounded, A ⊆ tCdfor some finite t > 0. Then sBd ⊆ A ⊆ tCd and hence, using the previous lemma and (1.14),

1tBd ⊆ A

[ ⊆ 1sCd. It follows that A[ is a standard convex corner. The converse holds by

Theorem 1.2.15.

Definition 1.2.18. For an Rd-convex corner A with non-empty interior, we define the pa-

rameter M(A) by letting

M(A) = inf

k∑i=1

λi : ∃k ∈ N, vi ∈ A and λi > 0 for i ∈ [k] such that

k∑i=1

λivi ≥ 1

.

(1.15)

If A has empty interior, Lemma 1.2.2 shows that the condition on the right hand side of

(1.15) cannot be satisfied, and we set M(A) =∞.

Note that M(Rd+) = 0. It is also easy to see that a standard convex corner A, being

bounded and containing a positive definite element, satisfies 0 < M(A) <∞.

Lemma 1.2.19. If A is a standard convex corner, the infimum in Definition 1.2.18 is at-

tained.

Proof. First note that for a standard convex corner A we have 0 < M(A) < ∞. From

(1.15), for all n ∈ N we can form xn =∑

i λ(n)i v

(n)i ≥ 1 with v

(n)i ∈ A and λ

(n)i > 0 such

that M(A) ≤∑

i λ(n)i ≤ M(A) + 1/n, giving that

∑i λ

(n)i →n→∞ M(A). By convexity,


(∑i λ

(n)i

)−1xn ∈ A for all n ∈ N. Since A is compact, the sequence

((∑i λ

(n)i

)−1xn

)n∈N

has a subsequence

((∑i λ

(nj)i

)−1xnj

)j∈N

convergent to some a ∈ A. Then

M(A)∑i λ

(nj)i

xnj →j→∞ M(A)a.

Comparing this to the limitM(A)∑i λ

(nj)i

1→j→∞ 1,

and noting that xnj ≥ 1 for all j ∈ N, it is clear that M(A)a ≥ 1. Since a ∈ A, this shows

that the infimum of (1.15) is achieved.

Recalling the definition of N(A) in (1.12), the following proposition links N(A),M(A)

and γ(A[). We use the convention that 1∞ = 0 and 1

0 =∞.

Proposition 1.2.20. Let A be an Rd-convex corner. It holds that

M(A) =1

N(A)= γ(A[).

Proof. First we consider the case that A 6= Rd+ has non-empty interior. Set y = N(A)1, and

observe that y ∈ A and N(A) > 0. Then 1N(A)y = 1 and M(A) ≤ 1

N(A) .

The reverse inequality follows from the proof of Lemma 1.2.19. Recall that for all n ∈ N

we had(∑

i λ(n)i

)−1xn ∈ A with

(∑i λ

(n)i

)−1≥ (M(A) + 1/n)−1 and xn ≥ 1. Hereditarity

gives that (M(A) + 1/n)−11 ∈ A for all n ∈ N. It follows that N(A) ≥ 1

M(A) .

By (1.14) and Lemma 1.2.16, it is easy to see that A = Rd+ if and only if A[ = 0 if

and only if γ(A[) = 0. It is also clear that if A has non-empty interior, then kCd ⊆ A for

some k > 0, and A[ ⊆ (kCd)[ = 1kBd, giving that A[ is bounded. Thus, when A 6= Rd+

has non-empty interior, we obtain 0 < γ(A[) < ∞. To prove the second equality in this

case, let w ∈ A[ satisfy 〈w,1〉 = γ(A[) ∈ (0,∞). By the definition of A[, it holds that 1 ≥

〈w,N(A)1〉 = N(A)γ(A[), and so γ(A[) ≤ 1N(A) . For the reverse inequality, set v = 1

γ(A[)1.

For all u ∈ A[, we have 〈v, u〉 = 1γ(A[) 〈1, u〉 ≤ 1. This shows that v ∈ A[[, and so v ∈ A by

Theorem 1.2.15. This is sufficient to show that N(A) ≥ 1γ(A[) , as required.

In the case that A = Rd+, the result holds with M(A) = 0, N(A) =∞, and γ(A[) = 0.

Finally, if A has empty interior, by Lemma 1.2.2 for some i ∈ [d] we have vi = 0 for all

v ∈ A, and then ui can be arbitrarily large for u ∈ A[, and A[ is unbounded. So when A has

1.3 Graph entropy 13

empty interior, the result holds with M(A) =∞, N(A) = 0 and γ(A[) =∞.

Theorem 1.2.13 and Proposition 1.2.20 immediately give the following corollary.

Corollary 1.2.21. For a standard Rd-convex corner A,

maxp∈Pd

HA(p) = − logN(A) = logM(A) = log γ(A[).

1.3 Graph entropy

This section concerns graph entropy, a real functional defined on a graph with a given prob-

ability distribution on its vertices which was first introduced in [23]. After a summary of

some basic terminology from graph theory in Section 1.3.1, the motivation of graph entropy

will be discussed in Section 1.3.2, where we recall that graph entropy is simply the entropy

of a probability distribution over a certain convex corner associated with the given graph.

Indeed, we show that a number of important graph parameters can be defined in terms of

convex corners. Finally, in Section 1.3.3, Theorem 1.2.9 will be used to yield a number of

new results on graph entropy.

1.3.1 Some graph theoretic preliminaries

Let G be a graph on vertex set V (G) with edge set E(G), where each edge is an unordered

pair of distinct vertices in V (G). In terminology and notation we broadly follow [15], and

here we consider only finite graphs. If i, j is an edge, we say vertex i is adjacent to vertex

j and write i ∼ j. If i is adjacent to or equal to j, we will write i ' j. If graphs F and G

are such that V (F ) ⊆ V (G) and E(F ) ⊆ E(G), then F is said to be a subgraph of G. A

spanning subgraph of G is a subgraph of G with vertex set V (G). An induced subgraph of G

is a subgraph of G, any two vertices of which are adjacent if and only if they are adjacent

in G. Specifically, if S ⊆ V (G), then the subgraph GS of G, induced by S, has V (GS) = S,

and the edges of GS are precisely those edges of G between vertices of S. A stable set or

independent set is a set of vertices in G no two of which are adjacent. The empty set will be

considered stable. The independence number, α(G), of graph G is the size of a largest stable

set. A set S ⊆ V (G) is a clique of G if i ∼ j for all distinct i, j ∈ S. The clique number,

ω(G), of graph G is the size of a largest clique. The complement of G is the graph G where

i ∼ j in G if and only if i 6' j in G. It is clear that G = G. The stable sets of G are precisely

the cliques of G and hence α(G) = ω(G).


A colouring of graph G is an assignment of colours to its vertices such that adjacent

vertices are assigned distinct colours. The smallest number of colours needed to colour graph

G is called the chromatic number, χ(G). Equivalently, χ(G) is the smallest m ∈ N such

that V (G) can be partitioned into m independent sets. The clique covering number χ(G)

is the smallest m ∈ N such that V (G) can be partitioned into m cliques; it is clear that

χ(G) = χ(G). The complete graph Kn has vertex set 1, . . . , n and edges joining every

pair of distinct vertices. The graph Kn is known as the empty graph on n vertices, having

vertex set 1, . . . , n and no edges. Graphs F and G are isomorphic if there exists a bijection

φ : V (F ) → V (G) such that x ∼ y in F if and only if φ(x) ∼ φ(y) in G. In that case we

say φ is an isomorphism from F to G and write F ∼= G. An automorphism of graph G is

an isomorphism from G to itself. A graph G is vertex transitive if for every pair of distinct

vertices i, j ∈ V (G), there exists an automorphism f such that f(i) = j. If F and G are

graphs, a mapping f : V (F )→ V (G) is a homomorphism if f(x) ∼ f(y) in G whenever x ∼ y

in F . A probabilistic graph (G, p) is a graph G equipped with a probability distribution p

defined on its vertices.

Many different graph products have been defined in the literature. Others will be discussed

later, but for now the following definition will suffice.

Definition 1.3.1. ([23].) The co-normal product (alternatively, ‘or’ product or ‘disjunctive’

product) of graphs F and G is the graph F ∗ G for which V (F ∗ G) = V (F ) × V (G) and

(i1, j1) ∼ (i2, j2) if and only if i1 ∼ i2 in F or j1 ∼ j2 in G. We denote G ∗ G ∗ . . . ∗ G, the

nth co-normal power of G, by Gn. Note F ∗G ∼= G ∗ F .

Definition 1.3.2. ([23]) A kernel or maximal stable set of G is a stable set which is not a

proper subset of a stable set of G.

Lemma 1.3.3. If K ⊆ V (F ∗ G), then K is a kernel of F ∗ G if and only if K = S × T

where S and T are kernels of F and G, respectively.

Proof. Clearly, if S and T are kernels of F and G respectively, S × T is stable in F ∗G. To

show that S × T is maximally stable in F ∗G, consider (i, j) ∈ V (F ∗G) with (i, j) /∈ S × T .

Then i /∈ S or j /∈ T . Without loss of generality, suppose i /∈ S. Since S is maximally

stable in F , it follows that i ∼ k in F for some k ∈ S and thus (i, j) ∼ (k, l) in F ∗ G for

some (k, l) ∈ S × T . We can conclude that S × T is a kernel of F ∗ G. Conversely, let K

be any kernel in F ∗ G. The projection of K onto V (F ), which we denote by projV (F )(K),

cannot contain adjacent vertices in F , so projV (F )(K) ⊆ S for some kernel S of F . Similarly,


projV (G)(K) ⊆ T for some kernel T of G. Then K ⊆ S×T . But as a kernel, K is maximally

stable and hence K = S × T .

Corollary 1.3.4. If F and G are graphs then

α(F ∗G) = α(F )α(G).

Let graph G have d vertices and label its independent sets S1, . . . , Sk. We identify RV (G),

the set of real-valued functions on V (G), with Rd. Now let v(Si) be the characteristic vector

of Si. It is easy to see the definition of χ(G) can be stated equivalently as

χ(G) = min

k∑i=1

µi : µi ∈ 0, 1,k∑i=1

µiv(Si) ≥ 1

.

Here the µi are simply weightings, each either 0 or 1, put on the independent sets of G. If

the restriction on µi is weakened to allow any non-negative weightings then we obtain the

fractional chromatic number, χf(G), given by

χf(G) = min

k∑i=1

µi : k ∈ N, µi ≥ 0,

k∑i=1

µiv(Si) ≥ 1

. (1.16)

It is clear that χf(G) ≤ χ(G).

1.3.2 Graph entropy and the problem of source coding without complete

distinguishability

The concept of graph entropy was introduced by Janos Korner in 1973 [23] to solve the

source coding problem in the case of a discrete, memoryless, stationary source with alphabet

X whose symbols are not all distinguishable. This could arise, for instance, if the symbols are

handwritten. We take distinguishability to be a symmetric (but not necessarily transitive)

binary relation on X that is known and fixed. We construct a graph G, known as a distin-

guishability graph, to describe this distinguishability relation where V (G) = X and where

vertices i and j are adjacent in G if and only if i and j are distinguishable. As in Section 1.1,

we consider a source consisting of a sequence (Xi)∞i=1 of independent, identically distributed

random variables, each taking values in X and following distribution p.

Now x, y ∈ X k are distinguishable if and only if they are distinguishable at least at one

coordinate. We form the graph Gk, the kth co-normal power of G, where V (Gk) = X k and


x ∼ y in Gk if and only if xi ∼ yi in G for at least one i ∈ 1, . . . , k. The graph Gk is then

the distinguishability graph for elements of X k. We set up a k-to-m binary code, but do not

insist on encoding all of X k, but merely a ‘probable set’ E ⊆ X k, where pk(E) ≥ 1 − λ for

some fixed λ ∈ (0, 1). Let GkE be the subgraph of Gk induced by E. Our encoding must map

distinguishable elements of E, or equivalently, adjacent vertices of GkE , to distinct binary

strings in 0, 1m. (The reason for this is that if x, y ∈ E are not distinguishable, then they

have potentially already been confused.) Assigning distinct codewords in 0, 1m to adjacent

vertices of GkE is a graph colouring problem and requires at least χ(GkE) codewords. Let

NG(k, λ) be the smallest integer m such that this encoding E → 0, 1m is possible, that is

NG(k, λ) =⌈log(

minχ(GkE) : E ⊆ X k, pk(E) ≥ 1− λ

)⌉,

where dxe denotes the smallest integer greater than or equal to x.

As k → ∞, Korner shows in [23] that NG(k, λ)/k tends to a well defined limit which

is independent of λ. This limit is called the graph entropy H(G, p), and thus we have the

following definition.

Definition 1.3.5. ([23]) The graph entropy of the probabilistic graph (G, p) is defined as

H(G, p) = limk→∞

1

klog(

minχ(GkE) : E ⊆ X k, pk(E) ≥ 1− λ

)(1.17)

where λ ∈ (0, 1).

We have the intuitive understanding of graph entropy as the ‘long-term mean number

of encoding bits required per source symbol sent’ for such encoding. For more details and

a survey of known results on graph entropy, see [49]. We can see how graph entropy gen-

eralises Shannon entropy by considering the graph entropy of the complete graph Kn under

an arbitrary probability distribution. This corresponds to an alphabet with no confusability,

the situation that Shannon’s source coding theorem addresses. For p ∈ Pn it should thus

hold that H(Kn, p) = H(p). To see this is indeed the case, note that if G = Kn, then GkE is

the complete graph with vertex set E and χ(GkE) = |E|. Then the expression for H(G, p) in

(1.17) is identical to the expression for H(p) given in (1.1), and the claimed equality holds.

Some definitions are needed before an alternative but equivalent expression for graph

entropy can be stated.

Definition 1.3.6. [49], [17]. We define VP(G), the vertex packing polytope of G, to be the

convex hull of the characteristic vectors of the stable sets of G.


Lemma 1.3.7. For any graph G, VP(G) is hereditary, that is, if v ∈ VP(G), then for all w

satisfying 0 ≤ w ≤ v, we have w ∈ VP(G).

Proof. Let v(A) denote the characteristic vector of set A. Take w ≤ v =∑

k αkv(Sk) ∈ VP(G)

where αk ≥ 0,∑

k αk = 1 and each Sk is stable in G. Choose i ∈ V (G) and let ε = vi − wi

where vi ≥ ε ≥ 0. For each stable set Sk containing i choose εk ≤ αk such that∑

k:i∈Sk εk = ε.

Note that if Sk is stable, so is Sk\i, where we use the fact that the empty set is stable in

the case Sk = i. We form

v′ =∑k:i∈Sk

((αk − εk)v(Sk) + εkv

(Sk\i))

+∑k:i/∈Sk

αkv(Sk) ∈ VP(G).

Then v′i = wi and v′j = vj for j 6= i. Repeating for each i ∈ V (G) gives the required result.

Lemma 1.3.8. If G is a graph on n vertices then Cn ⊆ VP(G) ⊆ Bn.

Proof. Since i is a stable set for each i ∈ V (G), we have u ∈ Rn+ :∑n

i=1 ui = 1 ⊆ VP(G)

and Cn ⊆ VP(G) by Lemma 1.3.7. Every characteristic vector v of a stable set in a graph G

satisfies v ≤ 1, and so VP(G) ⊆ Bn.

Consider graph G with V (G) = [n]. As the convex hull of a finite set of non-negative

vectors in Rn, VP(G) is convex, non-negative, non-empty and bounded. The fact that VP(G)

is closed follows from the standard result that the convex hull of a finite set in Rn is compact

(see for example [54, p. 45].) Hereditarity was established by Lemma 1.3.7, and it is thus

clear that VP(G) is a convex corner [11]. By Lemma 1.3.8, 1n1 ∈ VP(G), and so VP(G) is a

standard convex corner.

We now show that a number of important graph parameters can be defined in terms of

convex corners; this is an important theme.

Lemma 1.3.9. For any graph G, the independence number α(G) is given by

α(G) = γ(VP(G)).

Proof. Let u be the characteristic vector of a stable set of cardinality α(G). Since u ∈ VP(G),

it follows that max∑

i vi : v ∈ VP(G) ≥ α(G). If v ∈ VP(G), then v =∑

k λkv(Sk) where

v(Sk) is the characteristic vector of stable set Sk and∑

k λk = 1 where λk ≥ 0. For each k,

we have∑

i v(Sk)i ≤ α(G), and so

∑i vi ≤

∑k λkα(G) = α(G).


Since a set of vertices of G is independent in G if and only if it is a clique in G, it follows

immediately that

ω(G) = γ(VP(G)). (1.18)

In an analogous way to (1.16) on page 15, we can modify the definition of the clique

number ω(G) to obtain the fractional clique number ωf(G), as for example in [15]. First note

that set S ⊆ V (G) is a clique in G if and only if S contains no distinct elements in the same

independent set. This is equivalent to the condition that v(S), the characteristic vector of S,

satisfies 〈vi, w〉 ≤ 1 where w is the characteristic vector of any independent set, which is in

turn equivalent to the condition v(S) ∈ VP(G)[. Letting V (G) = [n], we can then see that

ω(G) = max

n∑i=1

λi : λi ∈ 0, 1, (λi)i∈[n] ∈ VP(G)[

,

where the λi are weightings given to the vertices of G. Weakening the restriction to λi ≥ 0

gives the fractional clique number,

ωf(G) = max

n∑i=1

λi : λi ≥ 0, (λi)i∈[n] ∈ VP(G)[

,

and we thus have

ωf(G) = γ(VP(G)[). (1.19)

It is immediate that ωf(G) ≥ ω(G). The vertex packing polytope VP(G) is given further

significance from the following result in [11].

Theorem 1.3.10. The graph entropy H(G, p) is given by

H(G, p) = min

−

n∑i=1

pi log vi : v ∈ VP(G)

. (1.20)

As VP(G) is a convex corner, we can write

H(G, p) = HVP(G)(p),

and we see that graph entropy is an example of entropy over a convex corner.

The stable sets of the complete graph Kn are i, for i ∈ [n], and the empty set ∅, and so

VP(Kn) =

v ∈ Rn+ :

n∑i=1

vi ≤ 1

= Cn.


Recalling (1.7) we observe that

H(Kn, p) = HCn(p) = H(p) for all p ∈ Pn. (1.21)

Since 1, . . . , n is stable in the empty graph Kn, it follows that 1 ∈ VP(Kn), and by Lemma

1.3.8, VP(Kn) = Bn. Then

H(Kn, p) = HBn(p) = 0 for all p ∈ Pn. (1.22)

By Lemmas 1.3.8 and 1.2.6, for every graph G on n vertices and for all p ∈ Pn we have

0 ≤ H(G, p) ≤ H(p). (1.23)

We return to the definition of χf(G) in (1.16). Since every v ∈ VP(G) is a convex

combination of the v(Si),

χf(G) = min

k∑i=1

λi : λi > 0,k∑i=1

λiu(i) ≥ 1, u(i) ∈ VP(G), k ∈ N

. (1.24)

In the notation of Definition 1.2.18 this can be written as

χf(G) = M(VP(G)). (1.25)

Equations (1.25) and (1.19) now show that the well-known result

χf(G) = ωf(G), (1.26)

given for instance in [15, Section 7.5], is just a special case of Proposition 1.2.20.

By writing H(G, p) = HVP(G)(p), the following corollary is immediate from Corollary

1.2.21 . (Note that the first equality is proved in [50, Lemma 4] by considering the properties

of VP(G). The result given in [50] is thus a special case of Theorem 1.2.13.)

Corollary 1.3.11. The maximum graph entropy of a given graph G over all probability

distributions is given by

maxp∈Pn

H(G, p) = logχf(G) = logωf(G) = log γ(VP(G)[) = − logN(VP(G)).


1.3.3 A lower bound on graph entropy

The lower bound H(G, p) ≥ 0 is given in (1.23). Applying Theorem 1.2.9 to the vertex

packing polytope yields a less trivial lower bound. We will show that the resulting lower

bound gives the exact graph entropy of a class of probabilistic graphs which includes the odd

cycles and their complements under certain probability distributions. A number of other new

results will also be given. We begin with the following lemma.

Lemma 1.3.12. Let the probability distribution p ∈ Pn satisfy p > 0. For graph G on n

vertices, let a ∈ VP(G) be a vector such that

−n∑i=1

pi log ai = H(G, p). (1.27)

Then a is in the convex hull of characteristic vectors of the kernels of G.

Proof. Let v(S) be the characteristic vector of S ⊆ V (G), and let a =∑m

j=1 αjv(Sj), where

S1, . . . , Sm are stable sets but not all kernels, and where αj > 0 satisfy∑m

j=1 αj = 1. Every

stable set that is not maximal is a proper subset of one that is. For each j ∈ [m], let Tj = Sj

if Sj is a kernel, and if Sj is not a kernel, choose kernel Tj such that Sj ⊂ Tj . Now let

b =∑m

j=1 αjv(Tj). Clearly b ∈ VP(G). We have v(Tj) − v(Sj) ≥ 0 for all j ∈ [m] and so

bi ≥ ai and − log bi ≤ − log ai for all i ∈ [n]. Furthermore, for some i ∈ [n], we have bi > ai

and − log bi < − log ai. Since pi > 0, it then holds that a is not the minimizing vector in

(1.20).

Remark 1.3.13. Lemma 1.2.3 shows that if p > 0 there is a unique a satisfying (1.27). How-

ever, if pi = 0 for some i, there may be more than one vector a satisfying (1.27). Furthermore,

such a vector amay not lie in the convex hull of the characteristic vectors of kernels ofG. How-

ever, if this is the case, using the method above we can form a vector b which lies in the convex

hull of the characteristic vectors of kernels of G and satisfies∑n

i=1 pi log 1bi

=∑n

i=1 pi log 1ai

.

Thus, for any (G, p), there will always be a vector satisfying (1.27) which lies in the convex

hull of the characteristic vectors of kernels of G.

Theorem 1.2.9 has the following corollary. This important result will be widely used in

the rest of the chapter to obtain some new results on graph entropy.

Corollary 1.3.14. For any probabilistic graph (G, p),

H(G, p) ≥ H(p)− logα(G). (1.28)


Equality holds if and only if α(G)p ∈ VP(G); in this case v = α(G)p is the unique vector in

VP(G) satisfying∑n

i=1 pi log 1vi

= H(G, p), where n = |V (G)|.

Proof. Set A = VP(G) in Theorem 1.2.9.

We note that (1.28), though not the equality condition given in Corollary 1.3.14, appears

as equation (14) in [24]. The proof given in [24] follows a different method.

Remark 1.3.15. Lemma 1.3.12 and the following remark show that there is a vector v satisfy-

ing∑n

i=1 pi log(1/vi) = H(G, p) in the convex hull of the characteristic vectors of kernels of

G. For equality in Corollary 1.3.14 we note that such a vector v must satisfy∑n

i=1 vi = α(G),

that is v must lie in the convex hull of the characteristic vectors of kernels of G of size α(G).

It is interesting to compare Corollary 1.3.14 to another known bound. Letting

α(G, p) = max

∑i∈S

pi : S is a stable set of G

,

Cardinal, Fiorini and Joret in [7] established that

− logα(G, p) ≤ H(G, p) (1.29)

Set B1(G, p) = H(p)− logα(G), the lower bound on H(G, p) from Corollary 1.3.14, and

B2(G, p) = − logα(G, p), the lower bound on H(G, p) from (1.29). In the case of the uniform

distribution u ∈ Pn and V (G) = [n], we have

B1(G, u) = H(u)− logα(G) = log

(n

α(G)

)= − logα(G, u) = B2(G, u).

However, it is easy to find probabilistic graphs with B1(G, p) > B2(G, p) and those with

B1(G, p) < B2(G, p). Take, for instance, the complete graph Kn. Here we have

B1(Kn, p) = H(p) = H(Kn, p),

but

B2(Kn, p) = − log(maxpj : j ∈ [n]) =−n∑i=1

pi log(maxpj : j ∈ [n])

≤−n∑i=1

pi log pi = H(Kn, p),


with equality if and only if the non-zero coordinates of p are equal.

Consider the path graph G shown below.

•1 •2 •3

The kernels of G are 1, 3 and 2, so H(G, p) can be found directly from (1.20) on

page 18 by a straightforward minimisation; just note that the minimising vector a lies in the

convex hull of the kernels of G and so is of the form a = (α, 1− α, α)t for α ∈ [0, 1].

With (p1, p2, p3) = (12 ,

14 ,

14) we have H(G, p) = 2 − 3

4 log 3. We also have B1(G, p) = 12 and

B2(G, p) = log 43 <

12 . In this case neither bound is optimal, but B1(G, p) is closer.

Now take the same graph with (p1, p2, p3) = (14 ,

12 ,

14) to yield H(G, p) = 1, B1(G, p) = 1

2

and B2(G, p) = 1, and only bound B2 is optimal.

As a final example, let G be the graph

•1 •2 •3 •4

and let (p1, p2, p3, p4) = (18 ,

38 ,

18 ,

38). This gives B1(G, p) = 2 − 3

4 log 3 and B2(G, p) = 2 −

log 3 < B1(G, p). It is easy to see that

α(G)p = 2p =1

4(1, 0, 1, 0)t +

3

4(0, 1, 0, 1)t ∈ VP(G),

and thus the equality condition in Corollary 1.3.14 is satisfied giving H(G, p) = B1(G, p).

We look now at some applications of Corollary 1.3.14 and in particular at those situations

where equality holds. Observe that every graph G has at least one kernel K with |K| = α(G).

Let G have kernels K1, . . . ,Km satisfying |Ki| = α(G), and let the characteristic vector of

Ki be v(i). For αi ≥ 0 and∑m

i=1 αi = 1 it then holds that v =∑m

i=1 αiv(i) ∈ VP(G)

and p = (1/α(G))v is a probability distribution satisfying α(G)p ∈ VP(G). Thus, we have

equality in Corollary 1.3.14, andH(G, p) = H(p)−logα(G). Let E(G) be the set of probability

distributions giving equality in Corollary 1.3.14 for graph G. Then E(G) is non-empty for

every graph G. In fact if m > 1 for graph G, then E(G) is an infinite set. Indeed, for the

complete graph Kn, the kernels are just the singletons, and it is trivial that E(Kn) = Pn.

In this section we prove some results concerning E(G) for certain graphs G. If G is vertex


transitive, we show that the uniform distribution u ∈ E(G). We characterize E(G) completely

when G is an odd cycle or its complement.

Proposition 1.3.16. Let G be a vertex transitive graph with n vertices. Then, when p is the

uniform distribution u with ui = 1n , i = 1, . . . , n, equality holds in Corollary 1.3.14 and

H(G, u) = H(u)− logα(G) = logn

α(G). (1.30)

Proof. Let G have N kernels of size α(G). We label them Aj for j = 1, . . . , N . By vertex

transitivity, each vertex is included in the same number k of the Aj ’s. Let v(j) be the

characteristic vector of Aj . We have∑N

j=1 v(j) = k1. The sum of the n components of this

vector is nk = Nα(G). Consider the vector v = 1N

∑Nj=1 v

(j). By construction, v ∈ VP(G).

We note v = kN 1 = α(G)

n 1 = α(G)u, giving equality in Corollary 1.3.14.

Proposition 1.3.16 also follows immediately from results in [50], [39] and [40]. We briefly

outline this approach.

A graph G is called symmetric if

maxp∈Pn

H(G, p) = H(G, u),

that is, its entropy is maximised by the uniform distribution u. It holds that every vertex

transitive graph is symmetric; for example, this is noted in [39]. Furthermore, G is symmetric

if and only if χf (G) = nα(G) [40, Corollary 3.4]; see also [46, Proposition 3.1.1]. Then if G is

vertex transitive, Corollary 1.3.11 gives that

H(G, p) ≤ logn

α(G)(1.31)

for any distribution p, with equality in the case of the uniform distribution.

We now analyse the odd cycle, C2n+1, with V (C2n+1) = [2n + 1] and i ∼ j if and only

if j − i ≡ ±1 mod 2n+ 1. (While we work with the odd cycles and their complements, our

arithmetic on the set of vertices 1, . . . , 2n+ 1 will be done modulo 2n+ 1.)

A graph G is called bipartite if one may partition its vertex set into two subsets V1 and

V2 such that no edge joins vertices in the same subset. Bipartite graphs have been widely

studied and their graph entropies can be found as outlined by Simonyi in [49]. A graph G is

called perfect (see [15]) if the chromatic number of any induced subgraph H of G is equal to

ω(H). Perfect graphs have attracted a considerable amount of attention in the literature, see


[50]. Note that odd cycles are not bipartite and C2n+1 is not perfect for n > 1; this motivates

us to consider them. (To see C2n+1 is not perfect for n > 1, consider C2n+1 as a subgraph

of itself. We have χ(C2n+1) = 3, but ω(C2n+1) = 2. The 3-cycle C3 = K3 and is perfect.)

We will see how Corollary 1.3.14 allows us to determine the graph entropy of odd cycles with

respect to certain probability distributions.

Since α(C2n+1) = n, Corollary 1.3.14 immediately gives H(C2n+1, p) ≥ H(p)− log n. Cy-

cles are vertex transitive, so by (1.30) equality certainly occurs with the uniform distribution,

and so H(C2n+1, u) = log 2n+1n . We now characterise the probability distributions p for which

equality in Corollary 1.3.14 holds for C2n+1.

Proposition 1.3.17. Let p be a probability distribution on V (C2n+1). The following are

equivalent:

(i) H(C2n+1, p) = H(p)− log n

(ii) pi + pj ≤ 1/n whenever i ∼ j.

Proof. The cycle C2n+1 has 2n + 1 kernels of size α(C2n+1) = n, which we label Ji with

i = 1, . . . , 2n+ 1. We write Ji = i, i+ 2, i+ 4, . . . , i+ 2(n− 1), modulo 2n+ 1. Let v(i) be

the characteristic vector of Ji.

We will show that if p ∈ R2n+1, then

α(C2n+1)p = np =

2n+1∑i=1

v(i)

(2n+1∑k=1

pk − npi−1 − npi−2

). (1.32)

Note in (1.32) p need not be a probability distribution. Each vertex is in n kernels. Specifi-

cally, vertex j lies in Ki for

i = j, j − 2, j − 4, . . . , j − 2(n− 1) mod (2n+ 1).

Thus the jth component of the right hand side of our identity is

n2n+1∑k=1

pk − n(pj−1 + pj−3 + . . .+ pj−2n+1)− n(pj−2 + pj−4 + . . .+ pj−2n) = npj

as required. Because (1.32) holds for any p ∈ R2n+1, the vectors v(1) . . . , v(2n+1) span R2n+1,

and so are linearly independent. Then this representation of α(C2n+1)p in terms of the v(i)’s

is unique.


To see that the second statement in our proposition implies the first, note that when p is

a probability distribution,

2n+1∑i=1

(2n+1∑k=1


)= 2n+ 1− n− n = 1.

Also, when pi + pj ≤ 1/n for all adjacent pairs of vertices i and j, we have that

0 ≤ 1− npi−1 − npi−2 =

2n+1∑k=1


for i ∈ [2n+ 1]. By (1.32), we now have that α(C2n+1)p ∈ VP(C2n+1), and Corollary 1.3.14

then completes the argument.

To show the converse, note by Corollary 1.3.14 and Remark 1.3.15 that if H(C2n+1, p) =

H(p)− log n for p ∈ P2n+1 then α(C2n+1)p lies in the convex hull of the characteristic vectors

of kernels of C2n+1 of size α(C2n+1) = n. Then by the uniqueness of the representation of

α(C2n+1)p in (1.32), we have 1 − npl−1 − npl−2 ≥ 0 for all l ∈ [2n + 1], giving pi + pj ≤ 1/n

when i ∼ j.

Remark 1.3.18. (i) Note the condition for Proposition 1.3.17 is certainly satisfied by the

uniform distribution, and by “small” disturbances from it.

(ii) When n = 1, Proposition 1.3.17 gives that H(K3, p) = H(p) for all p ∈ P3, as in (1.21).

Before discussing C2n+1, the complement of the odd cycle C2n+1, we give two well-known

lemmas concerning graphs having the same vertex set. We supply the proof of the first

because it uses a method we will employ later.

Lemma 1.3.19. (Monotonicity, [49, Lemma 3.1].) If F and G are graphs and F is a spanning

subgraph of G, then H(F, p) ≤ H(G, p) for any probability distribution p.

Proof. Stable sets in G are also stable in F , and hence VP(G) ⊆ VP(F ), whence the expres-

sion for H(G, p) in (1.20) yields the result.

Lemma 1.3.20. (Sub-additivity, [49, Lemma 3.2].) Suppose graphs F and G satisfy V (F ) =

V (G). The graph F ∪G is constructed such that V (F ∪G) = V (F ) and E(F ∪G) = E(F )∪

E(G). For any probability distribution p it follows that H(F ∪G, p) ≤ H(F, p) +H(G, p).

Remark 1.3.21. For any graph G, the graph G ∪G is complete, and Lemma 1.3.20 yields

H(G, p) +H(G, p) ≥ H(G ∪G, p) = H(p).


In the case of perfect graphs [50, Theorem 1] gives that H(G, p) + H(G, p) = H(p). When

n > 1 we have α(C2n+1) = 2 and so by Corollary 1.3.14, H(C2n+1, p) ≥ H(p) − log 2. Since

C2n+1 is vertex transitive, Proposition 1.3.16 gives for n > 1 that

H(C2n+1, u) = H(u)− log 2 = log(2n+ 1)− log 2.

Recalling that C2n+1 is not perfect for n > 1, we note for n > 1 that

H(C2n+1, u) +H(C2n+1, u) = 2 log(2n+ 1)− log(2n) ≥ log(2n+ 1) = H(u),

in line with the observation above. (Note C3 is the empty graph on three vertices and

H(C3, p) = 0 for any distribution p. Of course, C3 = K3 and H(C3, p) = H(p). Then

H(C3, p) +H(C3, p) = H(p), consistent with C3 being perfect.)

Proposition 1.3.22. Let p be a probability distribution on V (C2n+1) where n > 1. The

following are equivalent:

(i) H(C2n+1, p) = H(p)− log 2

(ii) pi+2 + pi+4 + . . .+ pi+2n ≤ 1/2 for all i = 1, . . . , 2n+ 1.

Proof. For n > 1, C2n+1 has 2n + 1 kernels of size α(C2n+1) = 2, given by Li = i, i + 1,

modulo 2n+ 1, with i = 1, . . . , 2n+ 1. Let v(i) be the characteristic vector of Li.

We now claim that if p ∈ R2n+1, then

α(C2n+1)p = 2p =

2n+1∑i=1

v(i)

(2n+1∑k=1

pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n

).

Vertex j lies in Li for i ≡ j mod (2n + 1) and i ≡ j − 1 mod (2n + 1). Then the jth

component of the right hand side of our identity is

22n+1∑k=1

pk − 2(pj+2 + pj+4 + . . .+ pj+2n)− 2(pj+1 + pj+3 + . . .+ pj+2n−1) = 2pj ,

as required. Then the vectors v(1), . . . , v(2n+1) span R2n+1 and so are linearly independent,

and this representation of α(C2n+1)p in terms of the v(i)’s is unique. The proof now proceeds

exactly as that of Proposition 1.3.17. We merely note if p ∈ P2n+1 that

2n+1∑i=1

(2n+1∑k=1

pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n

)= 2n+ 1− 2n = 1,


and that the condition

2n+1∑k=1

pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n ≥ 0

is equivalent to

pi+2 + pi+4 + . . .+ pi+2n ≤1

2

when p is a probability distribution.

Consider G′, a spanning subgraph of G, formed by removing an edge or edges from G, but

retaining the same vertex set. By the monotonicity of graph entropy as described by Lemma

1.3.19, H(G′, p) ≤ H(G, p). Here we use Corollary 1.3.14 to give a sufficient condition for

the equality H(G′, p) = H(G, p).

Proposition 1.3.23. Let the graph G and probability distribution p satisfy α(G)p ∈ VP(G).

If G′ is a spanning subgraph of G such that α(G′) = α(G), then H(G′, p) = H(G, p).

Proof. The stable sets of G are also stable sets of G′, and so VP(G) ⊆ VP(G′). Then

α(G′)p = α(G)p ∈ VP(G′) and so by Corollary 1.3.14,

H(G′, p) = H(p)− logα(G′) = H(G, p),

as required.

The complete bipartite graph Km,n is a graph whose vertices can be partitioned into two

subsets V1 and V2 with |V1| = m and |V2| = n and such that no edge joins vertices in the

same subset, but every vertex of V1 is adjacent to every vertex of V2. A graph G with 2n

vertices consisting of a disjoint union of n copies of the complete graph K2 is denoted M2n

and called a perfect matching. In [49] Simonyi showed that H(Km,m, u) = H(M2m, u) = 1.

Here we prove a more general result using Proposition 1.3.23. (Note that M2m is a spanning

subgraph of Km,m and that α(Km,m) = α(M2m) = m.)

Corollary 1.3.24. Let G be a spanning subgraph of Km,m satisfying α(G) = m. Then

H(G, u) = 1.

Proof. This fact follows immediately from Propositions 1.3.16 and 1.3.23 because Km,m is

vertex transitive and because α(G) = m = α(Km,m).


Example 1.3.25. As another example of Proposition 1.3.23, consider the graph C7 on vertex

set 0, . . . , 6. Let G\e denote the graph G with one edge e removed. It is easy to verify

that, for example, α(C7\0, 3) = 2 and thus by Proposition 1.3.23 and Proposition 1.3.16

we have H(C7\0, 3, u) = H(C7, u) = log(7/2).

It is interesting to note that the same argument does not apply to every edge of C7. For

instance, let F = C7\0, 2. We see that 0, 1, 2 is stable in F and α(F ) = 3, meaning that

Proposition 1.3.23 does not yield the value of H(F, u).

In fact, in this case H(F, u) = log 7− (3 log 3+4)/7 < log(7/2), as the following argument

shows. The kernels of F are K1 = 0, 1, 2 and Ki = i, i + 1 for i = 2, . . . , 6. We let v(i)

denote the characteristic vector of Ki. By Lemma 1.3.12 there exist αi ≥ 0 for i = 1, . . . , 6

with∑6

i=1 αi = 1 such that H(F, u) = −(1/7)∑6

i=0 log vi with

v =6∑i=1

αiv(i) = (α1 + α6, α1, α1 + α2, α2 + α3, α3 + α4, α4 + α5, α5 + α6)t ∈ VP(F ).

Now let β1 = α1+(α2+α6)/2 and β3 = β5 = (α2+2α3+2α4+2α5+α6)/4. Set β2 = β4 = β6 =

0. We note that βi ≥ 0 and∑6

i=1 βi = 1 and so v′ =∑6

i=1 βiv(i) = (β1, β1, β1, β3, β3, β3, β3)t ∈

VP(F ). Now let r = (v0 +v1 +v2)/3 = α1 +(α2 +α6)/3 and s = (v3 +v4 +v5 +v6)/4. Observe

that β1 ≥ r and β3 = s. Letting w = (r, r, r, s, s, s, s)t we have v′i ≥ wi for all i = 0, . . . , 6. By

the concavity of the log function,

log r ≥ (log v0 + log v1 + log v2)/3 and log s ≥ (log v3 + log v4 + log v5 + log v6)/4.

We conclude that

−(1/7)

6∑i=0

log v′i ≤ −(1/7)

6∑i=0

logwi ≤ −(1/7)

6∑i=0

log vi = H(F, u).

Since v′ ∈ VP(F ), Lemma 1.2.3 shows that v = v′ and v is of the form v = (l, l, l,m,m,m,m)t.

This gives α2 = α4 = α6 = 0, and so α1 = l and α3 = α5 = m. Then 0 < l < 1

and m = (1 − l)/2, whence a short calculation shows that −∑6

i=0 log vi is minimised when

l = 3/7. Thus v = 17(3, 3, 3, 2, 2, 2, 2)t, and the value of H(F, u) is as given.

Thus deletion of an edge may leave the graph entropy unchanged, but in general it may be

decreased: indeed, removing all the edges of graph G results in the graph entropy vanishing.

We prove the following lemma, which gives an upper bound on the reduction in graph entropy

due to deletion of an edge.


Proposition 1.3.26. Let G be a graph with vertex set V (G) and non-empty edge set E(G).

For some x, y ∈ E(G), let graph G′xy have V (G′xy) = V (G) and E(G′xy) = E(G)\x, y.

For any probability distribution on V (G) it then holds that

H(G′xy, p) ≥ H(G, p)− (px + py) log(px + py) + px log px + py log py.

Proof. Let F be the graph defined by V (F ) = V (G) with the single edge x, y. Then

F ∪G′xy = G and Lemma 1.3.20 gives

H(G′xy, p) ≥ H(G, p)−H(F, p). (1.33)

The kernels of F are K1 = V (G)\y and K2 = V (G)\x. By Remark 1.3.13, a vector

v satisfying −∑

i pi log vi = H(F, p) can be chosen of the form v = αv(1) + (1 − α)v(2),

where v(i) is the characteristic vector of Ki and α ∈ [0, 1]. Then −∑

i∈V (G) pi log vi =

−px logα− py log(1− α), where we note vi = 1 when i /∈ x, y. Elementary calculus shows

this is minimised by α = pxpx+py

. Thus we have,

H(F, p) = (px + py) log(px + py)− px log px − py log py,

which, with (1.33), yields the required result.

Having considered the deletion of edges, we now supply a proof of a rather intuitive result

concerning the deletion of zero-probability vertices.

Lemma 1.3.27. From probabilistic graph (G, p) we form probabilistic graph (G′, p′) by delet-

ing any number of vertices i ∈ V (G) with pi = 0 and deleting all edges incident on each deleted

vertex. We define the probability distribution p′ on V (G′) by p′j = pj for all j ∈ V (G′). Then

H(G, p) = H(G′, p′).

Proof. A stable set in G′ is stable in G. Thus if v′ ∈ VP(G′), we can define v ∈ VP(G) by vi =

v′i for i ∈ V (G′) and vi = 0 for i ∈ V (G)\V (G′). Let v′ ∈ VP(G′) satisfy−∑

i∈V (G′) p′i log v′i =

H(G′, p′). Then, with the convention 0 log 0 = 0,

H(G, p) ≤ −∑

i∈V (G)

pi log vi = −∑

i∈V (G′)

p′i log v′i = H(G′, p′).

We now work towards the reverse inequality. Let Sk be stable in G with characteristic

vector v(Sk), and let a =∑

k αkv(Sk) ∈ VP(G) satisfy −

∑i∈V (G) pi log ai = H(G, p). Then

1.4 Convex corners associated with a graph 30

Tk = Sk ∩ V (G′) is stable in G′ with characteristic vector u(Tk), given by u(Tk)i = v

(Sk)i for

i ∈ V (G′). Then b =∑

k αku(Tk) ∈ VP(G′) satisfies bi = ai for i ∈ VP(G′), and

H(G′, p′) ≤ −∑

i∈V (G′)

p′i log bi = −∑

i∈V (G)

pi log ai = H(G, p),

again setting 0 log 0 = 0.

1.4 Convex corners associated with a graph

The concept of graph entropy highlights the significance of the convex corner VP(G). In the

literature, the information theoretic significance of other convex corners associated with a

graph G has been discussed. Two of these convex corners, FVP(G) and TH(G), motivate

some of the work in later chapters, and it is now useful to recall their definitions and basic

properties. It will be an important theme of our work to observe how many important graph

parameters can be defined from associated convex corners. This section will be concluded by

applying Corollary 1.2.21 and Theorem 1.2.9 to FVP(G) and TH(G).

Definition 1.4.1. ([17], [22].) Let G be a graph. The fractional vertex packing polytope is

the convex corner given by

FVP(G) = VP(G)[.

Lemma 1.4.2. ([22].) If G is a graph on d vertices, then

Cd ⊆ FVP(G) ⊆ Bd and γ(FVP(G)) = χf(G).

Proof. Lemma 1.3.8 gives Cd ⊆ VP(G) ⊆ Bd and thus by (1.14) and Lemma 1.2.16,

Cd = B[d ⊆ FVP(G) ⊆ C[d = Bd.

Then note that γ(FVP(G)) = γ(VP(G)[) = χf(G) by Corollary 1.3.11.

The definition of TH(G), the theta corner of G, requires some background. Here we

follow [17]. Let graph G have d vertices. An orthonormal labelling (o.n.l.) of G is a family

(a(i))i∈V (G) of unit vectors in Rk for some k ∈ N which satisfy⟨a(i), a(j)

⟩= 0 when i 6' j

in G. We say((a(i))i∈V (G), c

)is a handled orthonormal labelling (h.o.n.l.) of G in Rk when

(a(i))i∈V (G) ⊆ Rk is an orthonormal labelling of G and c ∈ Rk is a unit vector. Let the set


C(G) ⊆ Rd+ be given by

C(G) =

(∣∣∣⟨c, a(i)⟩∣∣∣2)

i∈V (G)

:(

(a(i))i∈V (G), c)

a h.o.n.l. of G in Rk, k ∈ N

. (1.34)

Definition 1.4.3. ([17, Theorem 3.2], [18, Equation (9.3.3)].) The theta corner of G is given

by

TH(G) = C(G)[.

As the anti-blocker of a non-empty set in Rd+, TH(G) is a convex corner. In [28], Lovasz

introduced θ(G), a new graph parameter now known as the Lovasz number of G. There are

a number of equivalent ways to define θ(G); the following is given in [17].

Definition 1.4.4. For graph G, we define θ(G), the Lovasz number of G, by

θ(G) = γ(TH(G)). (1.35)

In [17, Corollary 3.4] it is shown that

TH(G) = TH(G)[. (1.36)

The following ‘classical sandwich theorem’ is given in [17, Theorem 3.6] and is also proved in

[22].

Theorem 1.4.5. If G is a graph, then

VP(G) ⊆ TH(G) ⊆ FVP(G).

By Lemmas 1.4.2 and 1.3.8 it then follows for graph G on d vertices that

γ(Cd) ≤ γ(VP(G)) ≤ γ(TH(G)) ≤ γ(FVP(G)) ≤ γ(Bd).

Lemmas 1.4.2 and 1.3.9 and (1.35) then yield the following result, as given in [22]:

1 ≤ α(G) ≤ θ(G) ≤ χf(G) ≤ d. (1.37)

It is also immediate from Theorem 1.4.5 that for all p ∈ Pd

HVP(G)(p) ≥ HTH(G)(p) ≥ HFVP(G)(p). (1.38)


Corollary 1.3.11 shows that maxp∈Pd HVP(G)(p) = logχf(G). We now give the analogous

results for FVP(G) and TH(G). Note that Corollary 1.4.7 is given in [29].

Corollary 1.4.6. For graph G it holds that maxp∈Pd HFVP(G)(p) = logα(G).

Proof. Corollary 1.2.21 gives

maxp∈Pd

HFVP(G)(p) = log γ(FVP(G)[) = log γ(VP(G)) = logα(G),

where we have also used Definition 1.4.1, Theorem 1.2.15 and Lemma 1.3.9.

Corollary 1.4.7. For graph G on d vertices maxp∈Pd HTH(G)(p) = log θ(G).

Proof. We again use Corollary 1.2.21 to obtain

maxp∈Pd

HTH(G)(p) = log γ(TH(G)[) = log γ(TH(G)) = log θ(G),

where the final two equalities follow from (1.35) and (1.36).

Indeed, from Corollary 1.2.21 it is clear that any graph parameter expressible in the

form γ(A) for some Rd-convex corner A is given by the maximum entropy of a probability

distribution p ∈ Pd over A[. Exploring links between graph parameters and entropy over

convex corners is a profitable line of enquiry: see for instance [53, Theorem 5.10].

Note that Corollary 1.3.14 may be written as HVP(G)(p) ≥ H(p)− logα(G), and we now

give the equivalent results for FVP(G) and TH(G).

Corollary 1.4.8. For any graph G it holds that

HFVP(G)(p) ≥ H(p)− logχf(G) and HTH(G)(p) ≥ H(p)− log θ(G).

Proof. This is immediate from Theorem 1.2.9, Lemma 1.4.2 and (1.35).

Chapter 2

Convex corners in Md

This chapter introduces the concept of an Md-convex corner, that is a convex corner whose

elements are d × d matrices. New results, analogous to those in the previous chapter for

Rd-convex corners will be given, including the ‘second anti-blocker’ theorem for Md-convex

corners. We will conclude by showing how the concept of Md-convex corners leads to the

definition of new entropic quantities in the field of quantum information, which generalise

the well-known von Neumann entropy.

2.1 Preliminaries

We let e1, . . . , ed denote the canonical basis of Cd and Md be the set of d × d matrices

with entries in C. We set Dd = span(eie∗i : i ∈ [d]), so that Dd is the algebra of d ×

d diagonal matrices. We let Mhd ,M

+d ,M

++d denote the sets of d × d Hermitian, positive

semi-definite and positive definite matrices respectively, and we set D+d = Dd ∩M+

d . For

M = (mij), N = (nij) with M,N ∈ Md we will use the Hilbert–Schmidt inner product

〈M,N〉 =∑d

i,j=1mijnij = Tr(MN∗). The associated Hilbert–Schmidt norm will be denoted

‖M‖2 and is given by ‖M‖2 =√〈M,M〉. We write ‖M‖ for the operator norm, given by

‖M‖ = sup‖Mv‖ : v ∈ Cd, ‖v‖ = 1. In this section we give some important preliminaries;

some other helpful but standard linear algebraic results are given in Appendix B.

The following lemma gives a useful property of positive semi-definite matrices that will be

used later in the chapter. The result is well-known, but we include a proof for completeness.

Recall that for M ∈M+d there exists a unique matrix M1/2 ∈M+

d satisfying M = (M1/2)2.

Lemma 2.1.1. If M ∈M+d is given by M =

∑ijmijviv

∗j where vi : i ∈ [d] is an orthonor-

33

2.1 Preliminaries 34

mal basis of Cd, then mii = ‖M1/2vi‖2 ≥ 0 and |mij | ≤√miimjj ≤ maxmii,mjj.

Proof. We havemij = 〈Mvj , vi〉 =⟨M1/2vj ,M

1/2vi⟩. It is immediate thatmii = ‖M1/2vi‖2 ≥

0. By the Cauchy–Schwarz inequality,

|mij | ≤ ‖M1/2vi‖‖M1/2vj‖ =√miimjj ,

as stated.

The following is an immediate but useful corollary concerning the zero elements of a

positive semi-definite matrix.

Corollary 2.1.2. [34, Lemma 15.17]. If M ∈M+d as above and mii = 0, then mij = mji = 0

for all j ∈ [d].

Lemma 2.1.3. The following are equivalent for a set A ⊆M+d :

(i) A is bounded, that is there exists k > 0 such that ‖M‖ ≤ k for all M ∈ A.

(ii) The set TrM : M ∈ A is bounded.

(iii) The set 〈u,Mu〉 : M ∈ A, u ∈ Cd with ‖u‖ = 1 is bounded.

Proof. (i) ⇒ (ii). Let M = (mij) ∈ M+d have eigenvalues λ1, . . . , λd. If ‖M‖ ≤ k, then

0 ≤ λi ≤ k for all i ∈ [d] and so 0 ≤ TrM =∑d

i=1 λi ≤ dk.

(ii) ⇒ (iii). Any unit vector u ∈ Cd is an element of some orthonormal basis V . If M ∈ A,

then M ≥ 0 and 〈v,Mv〉 ≥ 0 for all v ∈ Cd. Now TrM =∑

v∈V 〈v,Mv〉 and so if TrM ≤ k

for all M ∈ A, then 〈u,Mu〉 ≤ k for all M ∈ A, as required.

(iii) ⇒ (i). If 〈u,Mu〉 ≤ k for every unit vector u ∈ Cd, then ‖M‖ ≤ k, since ‖M‖ is the

largest eigenvalue of M .

Lemma 2.1.4. Let A1, . . . , An ∈ M+d and M =

∑ni=1Ai. If 〈v,Mv〉 = 0 for some v ∈ Cd,

then Aiv = 0 for all i = 1, . . . , n.

Proof. We have 〈v,Mv〉 =∑n

i=1 〈v,Aiv〉 =∑n

i=1 ‖A1/2i v‖2. If 〈v,Mv〉 = 0, then for all

i ∈ [n] we have ‖A1/2i v‖ = 0, and hence A

1/2i v = 0, giving Aiv = 0.

Remark 2.1.5. Lemma 2.1.4 shows that we can conclude that the sum∑n

i=1Ai of finitely

many positive semi-definite matrices is strictly positive when there exists no non-zero v ∈ Cd

satisfying Aiv = 0 for all i ∈ [n].

2.2 Convex corners and anti-blockers in Md 35

2.2 Convex corners and anti-blockers in Md

This section begins by defining the new concept of a convex corner in Md. We work by analogy

with convex corners in Rd, from where we are led to a natural definition of the anti-blocker

of a subset of M+d . Some basic properties of these convex corners and their anti-blockers will

be given, and a number of simple examples of convex corners in Md will be used to illustrate

the theory.

2.2.1 Definitions and basic properties

Recall that an Rd-convex corner is closed, convex, non-empty, non-negative and hereditary.

It is now natural to define an Md-convex corner as a subset A ⊆ Md which has these same

five properties. In this context, the terms closed, convex and non-empty are taken with their

standard meanings. A matrix M ∈ Md will be called non-negative if it is positive semi-

definite, and we then write M ≥ 0. By M ≥ N we mean M − N ≥ 0. If M ≥ 0 for all

M ∈ A, we say A is non-negative. If N ∈ A for all N satisfying 0 ≤ N ≤ M for some

M ∈ A, we say A is hereditary.

Definition 2.2.1. Set A ⊆Md is an Md-convex corner (or just convex corner where context

allows) if A ⊆Md is non-empty, closed, convex, non-negative and hereditary.

Our analysis of Md-convex corners will proceed analogously to that of Rd-convex corners

in the previous chapter. It is as important as it is trivial to note that an Rd-convex corner is

not an example of an Md-convex corner. Thus the results in Chapter 1, though analogous, are

not merely special cases of the results we now give in the Md case. The following definition and

lemma concern sets we will call diagonal convex corners, and will help clarify the connection

between Rd-convex corners and Md-convex corners. For v1, . . . , vd ∈ R+, we let φ denote the

canonical bijection Rd+ → D+d given by

φ

(d∑i=1

viei

)=

d∑i=1

vieie∗i . (2.1)

Definition 2.2.2. The set B ⊆ Dd is a diagonal Md-convex corner, or just diagonal convex

corner, (resp. standard diagonal Md-convex corner) precisely when B = φ(A) = φ(v) : v ∈

A for an Rd-convex corner (resp. standard Rd-convex corner) A.

Lemma 2.2.3. If C is an Md-convex corner, then φ−1(Dd ∩ C) is an Rd-convex corner, and

Dd ∩ C is a diagonal convex corner.


Proof. Note that φ−1(Dd ∩ C) is non-empty because the zero matrix is in Dd ∩ C by the

hereditarity of C. When A ≥ 0 it holds that φ−1(A) ≥ 0, thus φ−1(Dd ∩ C) inherits non-

negativity from Dd ∩ C. Similarly φ−1(Dd ∩ C) is both convex and closed because Dd ∩ C

is. Finally, if a ∈ φ−1(Dd ∩ C) and b ∈ Rd+ satisfies b ≤ a, then we have φ(a) ∈ Dd ∩ C

and also φ(a) ≥ φ(b) ∈ Dd. So φ(b) ∈ Dd ∩ C by the hereditarity of C, and it follows that

b ∈ φ−1(Dd ∩ C), establishing the hereditarity of φ−1(Dd ∩ C).

Remark 2.2.4. It is trivial that a diagonal convex corner B is non-empty, closed, convex and

non-negative because B = φ(A) for some A ⊆ Rd with these properties. Furthermore B is

hereditary over Dd in the sense that if 0 ≤ A ≤ B where B ∈ B and A ∈ Dd, then A ∈ B.

In the previous chapter the construction of an anti-blocker for a subset of Rn+ was dis-

cussed. We now define the anti-blocker of a subset of M+d .

Definition 2.2.5. For A ⊆M+d , we define its anti-blocker, A], by

A] = N ∈M+d : 〈N,M〉 ≤ 1 for all M ∈ A.

Definition 2.2.6. If B ⊆ D+d , we define its diagonal anti-blocker by B[ = Dd ∩ B].

Lemma 2.2.7. Let B = φ(A) where A ⊆ Rd+. It holds that B[ = φ(A[). If A is an Rd-convex

corner, then B[[ = B.

Proof. For the first assertion use the obvious identity 〈φ(u), φ(u)〉 = 〈u, v〉 for u, v ∈ Rd+. We

then have B[[ = φ(A[[), and Theorem 1.2.15 completes the proof.

Definition 2.2.8. If A is an Rd-convex corner and B = φ(A), recalling Definition 1.2.7,

(1.12) on page 9, and Definition 1.2.18, we define the following.

(i) γ(B) = γ(A). (Note then that γ(B) = maxTrT : T ∈ B.)

(ii) N(B) = N(A). (Note then that N(B) = maxβ ∈ R+ : βI ∈ B.)

(iii) M(B) = M(A).

Lemma 2.2.7 and Proposition 1.2.20 immediately give the following proposition.

Proposition 2.2.9. If B is a diagonal convex corner, then

M(B) =1

N(B)= γ(B[).


In view of the preceding results and definitions, it is often convenient simply to regard

the Rd-convex corner A and the diagonal Md-convex corner φ(A) as the same object. Note

that elements of a diagonal convex corner will commute, but that in general elements of an

Md-convex corner will not. For this reason we refer to the theory of Md-convex corners as

the non-commutative case, and that of Rd-convex corners as the commutative case.

In the remainder of this section, some basic properties of convex corners and their anti-

blockers will be discussed. Note that, as in the Rd case, we write (A])] = A]] to denote the

second anti-blocker of A.

Lemma 2.2.10. The following results hold:

(i) If A ⊆M+d is non-empty, then A] is a convex corner;

(ii) If A,B ⊆M+d and A ⊆ B then B] ⊆ A];

(iii) If A ⊆M+d then A ⊆ A]].

Proof. These are straightforward from the definitions. In (i), we can establish the hereditarity

of A] as follows. If B ≥ 0 satisfies B ∈ A], then Tr(BA) ≤ 1 for all A ∈ A. Consider C

such that 0 ≤ C ≤ B. By Lemma B.0.2 (v) it follows that Tr(CA) ≤ Tr(BA) ≤ 1 and so

C ∈ A]. For (iii) just note that if M ∈ A ⊆ M+d , then 〈M,N〉 ≤ 1 for all N ∈ A], and so

M ∈ A]].

Remark 2.2.11. By Lemma 2.2.10 (i) and Lemma 2.2.3, we see that for any B ⊆ D+d , the set

B[ is a diagonal Md-convex corner.

A set A ⊆ M+d satisfying A]] = A will be called reflexive. In the Rd case, Theorem

1.2.15 shows that A[[ = A for every Rd-convex corner. A main objective in this chapter is

to prove an analogous ‘second anti-blocker’ theorem for Md-convex corners. A number of

related issues and examples will be discussed in the remainder of this section, before this

problem is tackled in the next.

For δ > 0, consider the δ-ball with centre M ∈Md given by

B(M, δ) = N ∈Md : ‖N −M‖2 < δ.

Given a set A ⊆M+d , we define its interior relative to M+

d to be the set

M ∈ A : B(M, δ) ∩M+d ⊆ A for some δ > 0.


(The reason for examining the interior relative to M+d rather than the interior is that every

δ-ball will include elements of Md not in M+d , for instance elements M = (mij) ∈ Md that

fail to be Hermitian because mij 6= mji for some i, j.)

Lemma 2.2.12. Let A ⊆M+d be a convex corner. The following are equivalent:

(i) A has a non-empty interior relative to M+d ;

(ii) there exists r > 0 such that rI ∈ A;

(iii) for every non-zero vector v ∈ Cd there exists s > 0 such that svv∗ ∈ A;

(iv) A contains a positive definite element.

Proof. (i)⇒(ii). Suppose that A has a non-empty interior relative to M+d . Let A ∈ A and

δ > 0 be such that B(A, δ) ∩M+d ⊆ A. Noting ‖Id‖2 =

√d we have A + 1

2√dδI ∈ B(A, δ),

and, since A+ 12√dδI ≥ 0, it lies in A. Finally, since 1

2√dδI ≤ A+ 1

2√dδI, we have 1

2√dδI ∈ A

by the hereditarity of A.

(ii)⇒(iii). Suppose that r > 0 is such that rI ∈ A and let v ∈ Cd be a non-zero vector. Since

r‖v‖2 vv

∗ ≤ rI, the hereditarity of A implies that r‖v‖2 vv

∗ ∈ A.

(iii)⇒(iv). Let vidi=1 be an orthonormal basis of Cd and, for each i, let si > 0 be such that

siviv∗i ∈ A. Since A is convex, A =

∑di=1

sid viv

∗i ∈ A. Letting s = min sid : i ∈ [d] > 0, we

have sI ≤ A and so, by the hereditarity of A, it follows that sI ∈ A.

(iv)⇒(i). Let A ∈ A be strictly positive. Letting r be the minimum eigenvalue of A, we

have that 0 < rI ≤ A. For all M ∈ B(0, r) ∩ M+d , we have ‖M‖ ≤ ‖M‖2 ≤ r and so

0 ≤M ≤ rI ≤ A, giving that B(0, r) ∩M+d ⊆ A by hereditarity.

Definition 2.2.13. If convex corner A is bounded and has non-empty interior relative to

M+d , then we say A is a standard convex corner.

As an example, note that if P is a projection with rankP < d, then M ∈M+d : M ≤ P

is a convex corner, but not a standard convex corner, because it has empty interior relative

to M+d by Lemma 2.2.12.

Lemma 2.2.14. If C is a standard Md-convex corner, then φ−1(Dd ∩ C) is a standard Rd-

convex corner, and Dd ∩ C is a standard diagonal convex corner.


Proof. We apply Lemma 2.2.3. To complete the proof, note that if C is a standard Md-convex

corner, then by Lemma 2.2.12, rI ∈ C for some r > 0, and so r1 ∈ φ−1(Dd ∩ C). Lemma

1.2.2 then gives that φ−1(Dd ∩ C) has non-empty interior. Clearly φ−1(Dd ∩ C) is bounded if

C is bounded.

Before the next proposition we need the following well-known lemma; the proof is short

and is included for completeness.

Lemma 2.2.15. If X is a separable metric space and Y ⊂ X , then Y has a countable, dense

subset.

Proof. Let B = x1, x2, . . . be a countable, dense subset of X . For all n ∈ N and r ∈ N such

that Y ∩ B(xn,1r ) is non-empty, choose y(n,r) ∈ Y ∩ B(xn,

1r ). We claim that

F =

y(n,r) : n, r ∈ N, Y ∩ B

(xn,

1

r

)6= ∅

is a countable, dense subset of Y. To see this, take y ∈ Y. Then for all s ∈ N there exists

xi ∈ B such that xi ∈ B(y, 1s ). Note then that y ∈ B(xi,

1s ), and so Y ∩B(xi,

1s ) is non-empty.

We then have y(i,s) ∈ B(xi,1s ), and so y(i,s) ∈ B(y, 2

s ).

Proposition 2.2.16. Convex corner A ⊆M+d has non-empty interior relative to M+

d if and

only if A] is bounded.

Proof. If A has non-empty interior relative to M+d , then by Lemma 2.2.12, rI ∈ A for some

r > 0. Then 〈M, rI〉 = rTrM ≤ 1 for all M ∈ A]. Then TrM ≤ 1/r for all M ∈ A], and by

Lemma 2.1.3, A] is bounded.

To complete the proof we show thatA] is unbounded whenA has empty interior relative to

M+d . By Lemma 2.2.12, it is equivalent to show that A] is unbounded when it is assumed that

A contains no strictly positive element. Since Md is separable and A ⊆Md, by Lemma 2.2.15

there is a countable set E = A1, A2 . . . ⊆ A which is dense in A. Let Em = A1, . . . , Am

and write

Km = v ∈ Cd : ‖v‖ = 1, Av = 0 for all A ∈ Em.

It is clear that each Km is compact and that Km+1 ⊆ Km, m ∈ N. Now consider Bm =

(1/m)∑m

i=1Ai. By convexity, Bm ∈ A and so, by assumption, is not strictly positive. Thus

there exists v ∈ Cd such that Bmv = 0, and by Lemma 2.1.4, we have Aiv = 0 for all

i = 1, . . . ,m. Therefore Km is non-empty for all m ∈ N. Then by Cantor’s intersection


theorem [43, Corollary 2.36],⋂∞i=1Ki is non-empty. Then there exists v ∈ Cd of unit norm

such that Av = 0 for all A ∈ E . For any M ∈ A, there exists a sequence (Mi)i∈N where

Mi ∈ E and Mi →i→∞ M . This gives that Mv = 0 for all M ∈ A. Then Tr(Mvv∗) = 0 for

all M ∈ A, and so λvv∗ ∈ A] for all λ ≥ 0, meaning A] is unbounded.

Proposition 2.2.17. A convex corner A ⊆M+d is bounded if and only if A] has non-empty

interior relative to M+d .

Proof. If A] has non-empty interior relative to M+d , then, by the previous proposition, A]] is

bounded. But by Lemma 2.2.10, A ⊆ A]], and so A is bounded. Conversely, by Lemma 2.1.3

note that if A is bounded there exists k > 0 such that TrM = 〈I,M〉 ≤ k for all M ∈ A.

Thus 〈(1/k)I,M〉 ≤ 1 for all M ∈ A. Then (1/k)I ∈ A], and A] has non-empty interior

relative to M+d by Lemma 2.2.12.

The following is immediate.

Corollary 2.2.18. Suppose A is an Md-convex corner. Then A is a standard convex corner

if and only if A] is a standard convex corner.

Reflexivity is an important property, but that all Md-convex corners are reflexive will not

be shown until Section 2.3. However, we can now give the following results.

Lemma 2.2.19. If C ⊆M+d satisfies C = D] for some D ⊆M+

d , then C is a reflexive convex

corner.

Proof. Lemma 2.2.10 gives that C is a convex corner and that C ⊆ C]]. Since C = D], it

holds that C]] = D]]]. However, D ⊆ D]] and so, again by Lemma 2.2.10, D]]] ⊆ D]. This is

equivalent to C]] ⊆ C and completes the proof that C]] = C.

Lemma 2.2.20. An arbitrary intersection of Md-convex corners is an Md-convex corner.

Proof. Let Aα : α ∈ A be a collection of Md-convex corners for some set A of arbitrary

cardinality. The non-negativity of⋂α∈AAα is obvious and non-emptiness follows because

0 ∈ Aα for all α ∈ A. Finally note that an arbitrary intersection of closed, convex and

hereditary sets will retain these properties.

Remark 2.2.21. In contrast, note that even a union of twoMd-convex corners is not necessarily

an Md-convex corner.


Lemma 2.2.22. If Bα ⊆M+d for each α ∈ A, where A has arbitrary cardinality, then

(⋃α∈ABα

)]=⋂α∈AB]α.

Proof. For each β ∈ A we have Bβ ⊆⋃α∈A Bα, and Lemma 2.2.10 gives that

(⋃α∈A Bα

)] ⊆B]β. Hence,

(⋃α∈A Bα

)] ⊆ ⋂α∈A B]α.

Let M ∈⋂α∈A B

]α. Then Tr(MN) ≤ 1 for all N ∈

⋃α∈A Bα, giving that M ∈

(⋃α∈A Bα

)]and completing the proof.

Remark 2.2.23. In general it can happen that

(⋂α∈ABα

)]6=⋃α∈AB]α, (2.2)

even in the case each Bα is a convex corner. We will return to this in Proposition 2.3.16, but

for now note that in (2.2) the right hand side may not be a convex corner, but the left hand

side always is.

Lemma 2.2.24. An arbitrary intersection of reflexive Md-convex corners is a reflexive Md-

convex corner.

Proof. For each α ∈ A let Aα ⊆M+d be a reflexive convex corner, and consider A =

⋂α∈AAα.

Lemma 2.2.20 shows that A is a convex corner. By the reflexivity of Aα,

A =⋂α∈AA]]α =

⋂α∈AT ]α ,

where Tα = A]α. By Lemma 2.2.22,

A =

( ⋃α∈ATα)],

whence Lemma 2.2.19 gives A]] = A, completing the proof.

Given a subset G ⊆M+d , which may or may not be a convex corner, the next results show

that there is, in a sense we will explain, a ‘smallest’ convex corner C(G) which contains G.

Definition 2.2.25. We define her(B), the hereditary cover of B ⊆M+d , by letting

her(B) = M ∈M+d : ∃N ∈ B such that M ≤ N.


Let G ⊆ M+d be non-empty and bounded. (In many of the cases we will examine in the

next chapter, G will be a family of projections.) Let conv(G) be the closure of the convex

hull of G, which is clearly both bounded and closed, and so compact. The next lemma shows

that the set her(conv(G)), which trivially contains G, is an Md-convex corner.

Lemma 2.2.26. If G ⊆M+d is non-empty and bounded, then

A = her(conv(G))

is a bounded Md-convex corner.

Proof. The hereditarity, non-emptiness and non-negativity of A are obvious. To show the

convexity of A, take arbitrary A,B ∈ A and choose C,D ∈ conv(G) with A ≤ C and

B ≤ D. Let sequences (Cn)n∈N ⊆ conv(G) and (Dn)n∈N ⊆ conv(G) satisfy Cn →n→∞ C and

Dn →n→∞ D. For all n ∈ N and λ ∈ [0, 1], by the convexity of a convex hull, it holds that

λCn + (1− λ)Dn ∈ conv(G). Then

λCn + (1− λ)Dn →n→∞ λC + (1− λ)D ∈ conv(G).

Noting that λA+(1−λ)B ≤ λC+(1−λ)D gives λA+(1−λ)B ∈ A by the hereditarity of A,

thus demonstrating the convexity of A. To show that A is closed, suppose that (Tn)n∈N ⊆ A

and Tn →n→∞ T . Let Cn ∈ conv(G) be such that Tn ≤ Cn for all n ∈ N. Since conv(G) is

compact, (Cn)n∈N has a convergent subsequence, converging to a limit, say C, in conv(G).

Then T ≤ C and so T ∈ A, giving that A is closed. The boundedness of A follows from

that of G: indeed, there exists k > 0 such that G ≤ kI for all G ∈ G, and so A ≤ kI for all

A ∈ A.

Corollary 2.2.27. Suppose G ⊆ M+d is bounded and conv(G) contains a strictly positive

element. Then A = her(conv(G)) is a standard convex corner.

Proof. By Lemma 2.2.26, A is a bounded convex corner, and then since it contains a strictly

positive element, it has non-empty interior relative to M+d by Lemma 2.2.12.

Corollary 2.2.28. If C is a bounded (resp. standard) diagonal Md-convex corner, then her(C)

is a bounded (resp. standard) Md-convex corner.

Proof. Since conv(C) = C, this follows immediately from Lemma 2.2.26 and Corollary 2.2.27.


Lemma 2.2.29. If B is a convex corner and G ⊆ B, then

her(conv(G)) ⊆ B.

Proof. If G ⊆ B, then conv(G) ⊆ conv(B) = B, since B is convex. Then conv(G) ⊆ B = B,

since B is closed. Finally, her(conv(G)) ⊆ her(B) = B, by the hereditarity of B.

For any G ⊆ M+d , Lemma 2.2.26 guarantees the existence of a convex corner which

contains G. We make the following definition.

Definition 2.2.30. For G ⊆M+d , we define C(G) to be the intersection of all convex corners

containing G.

Theorem 2.2.31. For G ⊆M+d we have that C(G) is the smallest convex corner containing

G and C(G) = her(conv(G)).

Proof. It is clear from Lemma 2.2.20 that C(G) is the smallest convex corner to contain G.

The second assertion follows from Lemmas 2.2.26 and 2.2.29.

We call C(G) the convex corner generated by G.

Lemma 2.2.32. For A ⊆M+d it holds that A] = (her(A))] = (C(A))].

Proof. Since A ⊆ her(A) ⊆ C(A), Lemma 2.2.10 gives

A] ⊇ (her(A))] ⊇ (C(A))].

The proof will be completed by showing that A] ⊆ (C(A))]. First let Q ∈ conv(A) be such

that Q =∑n

i=1 λiAi, with Ai ∈ A and λ1, . . . , λn ∈ R+ satisfying∑n

i=1 λi = 1. For M ∈ A]

we have Tr(MQ) ≤∑n

i=1 λi = 1. For any P ∈ conv(A) we can choose a sequence (Qj)j∈N ⊆

conv(A) such that Qj →j→∞ P . By the continuity of the function M+d → R mapping A

to Tr(MA), it then holds that Tr(MP ) = limj→∞Tr(MQj) ≤ 1. Finally, since Tr(MN) ≤

Tr(MP ) when 0 ≤ N ≤ P , it follows that Tr(MN) ≤ 1 for all N ∈ her(conv(A)) = C(A),

and we conclude that M ∈ (C(A))] as required.

2.2.2 Examples of convex corners in Md

In this section we use some simple but instructive examples of Md-convex corners to illustrate

some of the results in the last section.


For C ∈Mhd and λ ∈ R, let

AC,λ =M ∈M+

d : Tr(MC) ≤ λ

and AC = AC,1. (2.3)

Further, let

NC =M ∈M+

d : Tr(MC) = 0, (2.4)

and

BC = M ∈M+d : M ≤ C. (2.5)

For λ > 0 it is clear that AC,λ = A(1/λ)C . For C ≥ 0 we have NC = AC,0.

Lemma 2.2.33. If C ∈M+d , then BC is a reflexive convex corner and B]C = AC .

Proof. The zero matrix is in BC , showing it is non-empty. Non-negativity is by definition.

That it is closed, convex and hereditary follows from basic properties in Definition B.0.1 and

Lemma B.0.2, and thus BC is a convex corner.

We then claim

B]C = N ∈M+d : Tr(NC) ≤ 1 = AC . (2.6)

To see this, note that if 0 ≤ M ≤ C and N ≥ 0, then 0 ≤ Tr(MN) ≤ Tr(CN) by Lemma

B.0.2 (v). So if N ∈ AC we have Tr(CN) ≤ 1 with N ≥ 0, and then Tr(MN) ≤ 1 for all

M ∈ BC , and we conclude N ∈ B]C . Conversely, if N ∈ B]C , then N ≥ 0 by definition, and

Tr(CN) ≤ 1, because C ∈ BC , giving that N ∈ AC . This proves (2.6).

By Lemma 2.2.10, in order to prove the reflexivity of BC , it suffices to show that B]]C ⊆ BC .

We will prove this by showing that if Q ≥ 0 and Q /∈ BC , then Q /∈ B]]C . To see this, note

that if 0 ≤ Q /∈ BC , then C −Q /∈M+d . However, C −Q is self adjoint because C,Q ∈M+

d .

Thus we can write

C −Q =d∑i=1

λiviv∗i ,

where v1, . . . , vd is an orthonormal basis for Cd, where λ1, . . . , λd ∈ R, and where λj < 0

for some j ∈ [d].

Now let D = αvjv∗j with α > 0 to be fixed shortly. We have D ≥ 0 and Tr(CD) =

αTr(Cvjv∗j ) = α 〈Cvj , vj〉 ≥ 0 because C ≥ 0. We have that

Tr((C −Q)D

)= Tr

(d∑i=1

λiviv∗i αvjv

∗j

)= λjα. (2.7)


Our aim is to show that Q /∈ B]]C , and to this end we identify two cases.

(i) In the case that 〈Cvj , vj〉 = 0, let α = −2/λj > 0. We have Tr(CD) = 0, and so

D ∈ AC = B]C . But using (2.7), Tr(QD) = Tr(CD)− λjα = 2 > 1 and we conclude Q /∈ B]]C .

(ii) Since C ≥ 0, the remaining case is when 〈Cvj , vj〉 > 0. Here we let α = (〈Cvj , vj〉)−1 > 0,

giving Tr(CD) = 1, D ≥ 0 and D ∈ AC = B]C . Then Tr(QD) = Tr(CD)− λjα > 1, and we

again conclude Q /∈ B]]C , and the proof is complete.

Lemmas 2.2.19 and 2.2.33 have the following corollary.

Corollary 2.2.34. Let C ∈M+d and λ > 0. Then AC,λ is a reflexive Md-convex corner, and

A]C,λ = B(1/λ)C .

Proof. By Lemma 2.2.33,

AC,λ = A(1/λ)C = B](1/λ)C . (2.8)

That AC,λ is a reflexive convex corner now follows from Lemma 2.2.19. Since Lemma 2.2.33

shows that B(1/λ)C is reflexive, anti-blocking (2.8) gives that A]C,λ = B(1/λ)C .

It is instructive to consider the set AC for different C ∈Mhd . Putting λ = 1 in Corollary

2.2.34 gives that AC is a reflexive convex corner for all C ∈ M+d . Furthermore, we have the

following proposition.

Proposition 2.2.35. (i) If C ∈M++d , then AC and A]C are standard convex corners;

(ii) If C ∈M+d \M

++d , then AC and A]C are convex corners, but neither is a standard convex

corner;

(iii) If C ∈ Mhd but ±C /∈ M+

d , then AC is not a convex corner, AC is not reflexive and

A]C = 0;

(iv) If −C ∈M+d , then AC = M+

d , which is reflexive, and A]C = 0.

Proof. Write C =∑d

i=1 µiviv∗i for some orthonormal basis vi, . . . , vd ⊆ Cd of eigenvectors

of C with µi, . . . , µd ∈ R.

(i) For M ∈ M+d we let M =

∑di,j=1mijviv

∗j , giving Tr(MC) =

∑di=1 µimii, where each

mii ≥ 0 by Lemma 2.1.1. In this case each µi > 0, and so if M ∈ AC we have 0 ≤ mii ≤ 1/µi

for each i ∈ [d], and AC is a bounded convex corner by Lemma 2.1.3. By Corollary 2.2.34,

A]C = BC , and so for all M ∈ A]C we have that 0 ≤ M ≤ C giving that A]C is bounded.

Then, by Propositions 2.2.16 and 2.2.17, AC and A]C have non-empty interiors relative to

M+d and we see that both AC and A]C are standard convex corners.


(ii) If C ∈ M+d \M

++d then µi ≥ 0 for all i and µj = 0 for some j. Then for all α ≥ 0,

we have Tr(αvjv∗jC) = αµj = 0 and αvjv

∗j ∈ AC for all α ≥ 0. Then AC is an unbounded

convex corner. By Corollary 2.2.34, A]C = BC , and so A]C is bounded. Propositions 2.2.16

and 2.2.17 then give that AC has non-empty interior relative to M+d and that A]C has empty

interior relative to M+d . The claim follows.

(iii) Here we can write C = C∗ =∑d

i=1 µiviv∗i with µj < 0 for some j and µk > 0 for

some k. In this case AC lacks hereditarity, and so is not a convex corner. To see this, take

M = − 2µjvjv∗j + 2

µkvkv∗k ≥ 0. We have Tr(MC) = 0 and so M ∈ AC . However, consider

0 ≤ N = 2µkvkv∗k ≤ M . We have Tr(NC) = 2 and N /∈ AC . Note that αvjv

∗j ∈ AC for all

α ≥ 0, and so AC is unbounded. The following argument shows that her(AC) = M+d . Take

M ∈ M+d and let k = 〈M,C〉. If k ≤ 1, then M ∈ AC ⊆ her(AC). If k > 1, then M ≤ M ′

where M ′ = M − kµjvjv∗j satisfies 〈M ′, C〉 = 0. Thus M ′ ∈ AC and M ∈ her(AC). Lemma

2.2.32 then yields A]C = (M+d )] = 0 and AC is not reflexive; indeed, A]]C = M+

d 6= AC .

(iv) In this case each µi ≤ 0, and Tr(MC) ≤ 0 for all M ∈ M+d by Lemma B.0.2 (iv),

giving that AC = M+d and A]C = 0. Here, rather trivially, A]]C = M+

d = AC .

We complete a similar analysis for BC where C ∈Mhd .

Proposition 2.2.36. (i) If C ∈M++d , then BC is reflexive and is a standard convex corner;

(ii) If C ∈M+d \M

++d , then BC is reflexive and is a convex corner with empty interior relative

to M+d ;

(iii) If C ∈Mhd \M

+d , then BC = ∅.

Proof. For C ≥ 0, Lemma 2.2.33 gives that BC is a reflexive convex corner satisfying B]C =

AC . For all M ∈ BC , we have 0 ≤ TrM ≤ TrC and by Lemma 2.1.3, the set BC is bounded.

It is now straightforward to complete the proof of the three statements above.

(i) If C ∈ M++d we have 0 < C ∈ BC and by Lemma 2.2.12, BC has non-empty interior

relative to M+d and is a standard convex corner.

(ii) Here there exists u ∈ Cd satisfying 〈u,Cu〉 = 0. Since 〈u, (C −M)u〉 ≥ 0 when

0 ≤ M ≤ C, we have that 〈u,Mu〉 = 0 and thus M /∈ M++d for all M ∈ BC . It is then clear

by Lemma 2.2.12 that BC has empty interior relative to M+d .

(iii) Suppose that M ∈ BC for some M ∈ M+d . Then 0 ≤ M ≤ C, requiring that 0 ≤ C.

From this contradiction we conclude BC = ∅.


Recall for C ∈Mhd that NC = A ∈M+

d : Tr(AC) = 0.

Lemma 2.2.37. If C ∈M+d , then NC is a reflexive convex corner.

Proof. It is clear that NC satisfies the required conditions to be a convex corner. Write

C =∑d

i=1 λiviv∗i where vidi=1 is an orthonormal basis of Cd and 0 ≤ λ1 ≤ . . . ≤ λd. We let

P be the projection onto ran(C) and P⊥ = I −P . Let 0s ∈Ms denote the zero s× s matrix

and set rank(P ) = r, so that λ1 = . . . = λd−r = 0 and λi > 0 for i = d − r + 1, . . . , d. For

M ∈M+d , write M =

∑di,j=1 αijviv

∗j with αij ∈ C. Then Tr(MC) =

∑di=1 λiαii. By Lemma

2.1.1, αii ≥ 0 for all i. Now, suppose that M ∈ NC . For i = 1, . . . , d− r we see that αii can

be arbitrarily large. However, we must have αii = 0 for i = d − r + 1, . . . , d, and then by

Corollary 2.1.2, αij = 0 whenever i = d − r + 1, . . . , d or j = d − r + 1, . . . , d. Working in

basis viv∗j : i, j ∈ [d] we then have

NC = M ⊕ 0r : M ∈M+d−r (2.9)

Now suppose N =∑n

i,j=1 βijviv∗j ∈ N

]C . We have 〈N,M〉 ≤ 1 for all M ∈ NC , and

thus βii = 0 whenever i = 1, . . . , d − r. Corollary 2.1.2 then gives that βij = 0 whenever

i = 1, . . . , d − r or j = 1, . . . , d − r. If i = d − r + 1, . . . , d, then βii can be arbitrarily large.

It follows that N ]C is given by

N ]C =

0d−r ⊕N : N ∈M+

r

, (2.10)

whence a repetition of the same argument gives that N ]]C = NC .

We give some further properties of the set NC .

Proposition 2.2.38. (i) If C ∈ M+d , then neither of the convex corners NC and N ]

C are

standard convex corners;

(ii) If C ∈M++d , then NC = 0;

(iii) If C ∈Mhd and ±C /∈M+

d , then NC is not a convex corner and N ]C = 0.

Proof. Let C =∑d

i=1 λiviv∗i , where vidi=1 is an orthonormal basis of Cd and each λi ∈ R.

Again denote the projection onto ran(C) by P , and let r = rank(P ).

(i) Note from (2.9) and (2.10) that if r ≥ 1, then NC has empty interior relative to M+d

and N ]C is unbounded. Similarly, if d − r ≥ 1, then N ]

C has empty interior relative to M+d

and NC is unbounded.

2.3 Reflexivity of Md-convex corners 48

(ii) If C > 0, then r = d and (2.9) gives NC = 0.

(iii) In this case NC is not a convex corner, and N ]C = 0. To see this note that λj > 0 for

some j ∈ [d] and λk < 0 for some k ∈ [d]. Let M = λjvkv∗k−λkvjv∗j . Note M ≥ 0 and satisfies

Tr(MC) = 0, giving that αM ∈ NC for all α ≥ 0. It is immediate that NC is unbounded.

Since M ≥ λjvkv∗k /∈ NC , it is clear that NC lacks hereditarity and is not a convex corner.

Let N =∑

r,s αrsvrv∗s ∈ N

]C for αrs ∈ C. Since N ≥ 0, it holds that αii ∈ R+ for all i ∈ [d].

We have αM ∈ NC , so it is required that Tr(αMN) = α(λjαkk − λkαjj) ≤ 1 for all α ≥ 0,

giving αjj = αkk = 0: indeed αii = 0 for all i such that λi 6= 0. If λm = 0 for some m ∈ [d],

then Tr(βvmv∗mC) = 0 and so βvmv

∗m ∈ NC for all β ≥ 0. Then Tr(βvmv

∗mN) = βαmm, and

we require αmm = 0 for N ∈ N ]C . By Corollary 2.1.2 it follows that N = 0 and N ]

C = 0, as

claimed.

Remark 2.2.39. Note that N−C = NC , so the cases −C ∈M+d and −C ∈M++

d do not require

separate consideration.

2.3 Reflexivity of Md-convex corners

In the previous section, the reflexivity of a number of different Md-convex corners has been

demonstrated: indeed, any subset of Md we have examined that has failed to be reflexive

has also failed to be a convex corner. In this section we will prove the ‘second anti-blocker

theorem’ in this non-commutative setting, showing indeed that every Md-convex corner is

reflexive.

2.3.1 The second anti-blocker theorem

In [22] it is proved that ifA is an Rd-convex corner, thenA[[ = A, a result we called the second

anti-blocker theorem. However, the method of the proof given there does not generalise to

the case of Md-convex corners, and we will need to modify the approach.

First we recall the Hahn-Banach separation theorem.

Theorem 2.3.1. [42, Theorem 3.4] Suppose P and Q are disjoint non-empty convex sets in

a normed vector space V. If P is closed and Q is compact, then there exists a continuous

linear functional Λ : V → C and γ ∈ R such that

Re(Λ(p)) < γ < Re(Λ(q))


for all p ∈ P and for all q ∈ Q.

Recall from (2.3) the notation AC =M ∈M+

d : Tr(MC) ≤ 1

. The following proposi-

tion shows the fundamental importance of sets of this form.

Proposition 2.3.2. Every Md-convex corner A 6= M+d is of the form

A =⋂α∈AAQα ,

for some set A and some Qα ∈Mhd , α ∈ A.

Proof. We work in the Hilbert space Md. If Λ : Md → C is a linear functional, standard

theory for Hilbert spaces ([42, Theorem 12.5]) implies that there exists N ∈ Md such that

Λ(M) = 〈M,N〉. Then, applying Theorem 2.3.1 to the Md-convex corner A and the single

point W ∈M+d \A, there exists γ ∈ R and N ∈Md such that

Re(〈M,N〉) < γ < Re(〈W,N〉) for all M ∈ A.

Now let T = 12(N +N∗) = T ∗. It is easy to see that for M ∈M+

d ,

Re(〈M,N〉) =1

2(〈M,N〉+ 〈M,N〉) =

1

2Tr(MN∗ +NM∗) =

1

2Tr(MN∗ +MN) = 〈M,T 〉 ,

where we have used that M = M∗ and the cyclicality of trace. Thus

〈M,T 〉 < γ < 〈W,T 〉 for all M ∈ A,

and the set M ∈ Md : 〈M,T 〉 = γ is a hyperplane separating A from W ∈ M+d \A. We

note that 0 ∈ A and so γ > 0. In this way, for each V ∈ M+d \A, we can find TV ∈ Mh

d and

γV ∈ (0,∞) such that 〈M,TV 〉 ≤ γV < 〈V, TV 〉 for all M ∈ A. Then letting QV = 1γVTV , it

holds that V /∈ AQV but that A ⊆ AQV for each V ∈M+d \A. Hence

A =⋂

V ∈M+d \A

AQV ,

and the result is proved.

Corollary 2.3.3. If Qα ∈ M+d for all α ∈ A, then A =

⋂α∈AAQα is a reflexive Md-convex

corner.


Proof. This is immediate from Lemma 2.2.24 and Corollary 2.2.34.

Remark 2.3.4. Note that in Proposition 2.3.2, A can be chosen to be countable. To see this,

note from Lemma 2.2.15, that since Md is separable, E = Qα : α ∈ A ⊆Md has a countable

dense subset F = Qα : α ∈ A0 where A0 ⊆ A is countable. Let A0 =⋂α∈A0

AQα . Clearly

A ⊆ A0. It remains to show that A0 ⊆ A. Because F is dense in E, for each α ∈ A we have a

sequence (αi)i∈N ⊆ A0 such that Qαn →n→∞ Qα. Then for M ∈ A0, we have Tr(MQαn) ≤ 1

for all n ∈ N and it follows by continuity that Tr(MQα) = limn→∞Tr(MQαn) ≤ 1. We

conclude M ∈ A. Thus A0 ⊆ A as required.

Before establishing a second anti-blocker theorem for all Md-convex corners, we note that

a general convex corner may have empty interior relative to M+d . This situation is addressed

now, beginning with the following linear algebraic lemmas.

Lemma 2.3.5. Let u1, . . . , un with n ≤ d be linearly independent, but not necessarily or-

thonormal, vectors in Cd. Then

span(u1, . . . , un) = ran

(n∑i=1

uiu∗i

).

Proof. It is immediate that

ran

(n∑i=1

uiu∗i

)⊆ span(u1, . . . , un).

Since∑n

i=1 uiu∗i is a linear operator, to prove the reverse inclusion it suffices to show that

uj ∈ ran (∑n

i=1 uiu∗i ) for all j ∈ [n]. The result is trivial when n = 1, and we proceed by

induction. For k ∈ [n], write Sk = span(u1, . . . , uk) and Mk =∑k

i=1 uiu∗i . Assume the

lemma holds for n = k, that is Sk ⊆ ran(Mk) for some k ∈ [n]. Because we are free to

reorder the ui’s as we please, it suffices to prove that uk+1 ∈ ran(Mk+1). As u1, . . . , un are

linearly independent, uk+1 /∈ Sk and so Sk ( Sk+1 with dim(Sk) = dim(Sk+1) − 1. Then

dim(S⊥k ) = dim(S⊥k+1) + 1, giving the strict inclusion S⊥k+1 ( S⊥k . Now choose v ∈ S⊥k \S⊥k+1;

it must hold that 〈v, ui〉 = 0 for all i ∈ [k], but 〈v, uk+1〉 6= 0. Then Mkv = 0 and

Mk+1

(1

〈v, uk+1〉v

)= uk+1,

as required.

For the following analysis we fix 0 ( A ( M+d , an Md-convex corner with possibly

empty interior relative to M+d . (We note that the convex corners 0 and M+

d are trivially


reflexive, which is why we may exclude them here.) Let

U = v ∈ Cd : there exists r > 0 such that rvv∗ ∈ A. (2.11)

Lemma 2.3.6. Let A be an Md-convex corner. If v ∈ Cd is an eigenvector of an element

M ∈ A with corresponding eigenvalue λ > 0, then v ∈ U .

Proof. We have M − λ‖v‖2 vv

∗ ≥ 0 and so 0 ≤ λ‖v‖2 vv

∗ ≤ M ∈ A, giving that λ‖v‖2 vv

∗ ∈ A by

hereditarity.

Now let P be the projection onto span(U) so that Pu = u for all u ∈ span(U).

Lemma 2.3.7. There exists r > 0 such that rP ∈ A.

Proof. Let uiki=1 with ui ∈ U for all i ∈ [k] be a basis (not necessarily orthonormal) of

span(U). By the definition of U , for each ui there exists ri > 0 such that riuiu∗i ∈ A. By

convexity, R = 1k

∑ki=1 riuiu

∗i ∈ A. Letting r0 = 1

k mini∈[k] ri > 0 and Q = r0

(∑ki=1 uiu

∗i

),

we have 0 ≤ Q ≤ R. Then by hereditarity, Q ∈ A. By Lemma 2.3.5,

ran(Q) = ran

(k∑i=1

uiu∗i

)= span(u1 . . . , uk) = span(U) = ran(P ).

Now let r be the smallest positive eigenvalue of Q. Then r > 0 and rP ≤ Q, giving rP ∈ A

by hereditarity.

Remark 2.3.8. Since P is the projection onto span(U), for any non-zero u ∈ span(U) we have

uu∗

‖u‖2 ≤ P. By Lemma 2.3.7, there exists r > 0 such that rP ∈ A and thus r‖u‖2uu

∗ ∈ A. We

conclude that u ∈ U , and noting that 0 ∈ U we have U = span(U). Thus U is a subspace of

Cd and ran(P ) = U .

Lemma 2.3.9. For M ∈ Mhd it holds that PMP = M if and only if every eigenvector v of

M with non-zero eigenvalue satisfies v ∈ U .

Proof. Let M =∑k

i=1 λiviv∗i where λi 6= 0 and vi ∈ U for all i ∈ [k]. Then PMP =∑k

i=1 λiPvi(Pvi)∗ =

∑ki=1 λiviv

∗i = M. Conversely, if M = PMP has eigenvector v with

non-zero eigenvalue, then

v ∈ ran(M) = ran(PMP ) ⊆ ran(P ) = U ,

where we have used Remark 2.3.8.


By Lemmas 2.3.6 and 2.3.9, it is now immediate that PMP = M for all M ∈ A. We can

thus write PAP = A. Now let P⊥ = I − P . Clearly P⊥v = 0 for all v ∈ U .

Lemma 2.3.10. The following hold:

(i)⟨M,P⊥

⟩= 0 for all M ∈ A;

(ii)⟨M,P⊥

⟩> 0 for all M ≥ 0 satisfying PMP 6= M .

Proof. The first assertion is immediate because⟨M,P⊥

⟩= Tr(MP⊥) = Tr(PMPP⊥) = 0

for M ∈ A. For the second, note that for v /∈ U we have Pv 6= v and P⊥v 6= 0. If PMP 6= M ,

then by Lemma 2.3.9, M has at least one eigenvector v /∈ U with positive eigenvalue. Then

for each such eigenvector we have

⟨vv∗, P⊥

⟩=⟨v, P⊥v

⟩=⟨v, (P⊥)2v

⟩= ‖P⊥v‖2 > 0,

and the claim follows.

Having fixed the Md-convex corner A, and with P the projection onto span(U) = ran(P )

as before, we set k = rank(P ) and let MPd ⊆M

+d be given by

MPd = M ∈M+

d : rank(M) = k, PMP = M.

Note then from Lemma 2.3.9 that

MPd = M ∈M+

d : there exist s ≥ r > 0 such that rP ≤M ≤ sP, (2.12)

where we can set r to be the minimum positive eigenvalue of M and s to be the maximum

eigenvalue of M .

Recall that for C ∈Mhd we have set

AC =M ∈M+

d : Tr(MC) ≤ 1

and NC =M ∈M+

d : Tr(MC) = 0.

Proposition 2.3.11. Let A be an M+d -convex corner and let P be the projection onto U as

defined in (2.11). Set

A0 = A ∈MPd ∩ A : (1 + ε)A /∈ A for all ε > 0.


Then for each A ∈ A0 there exists a matrix RA ∈M+d such that

A =⋂A∈A0

ARA ∩NP⊥ .

Proof. Consider A ∈ A0 and let An = (1 + 1/n)A, n ∈ N. By construction, An /∈ A for all

n ∈ N. As in the proof of Proposition 2.3.2, for all n ∈ N, there exists QAn ∈Mhd such that

〈M,QAn〉 ≤ 1 < 〈An, QAn〉 for all M ∈ A. (2.13)

Note that A,An ∈ MPd and so PAP = A and PAnP = An, n ∈ N. Recall that PMP = M

for all M ∈ A. Thus

〈M,QAn〉 = 〈PMP,QAn〉 = 〈M,PQAnP 〉

for all M ∈ A, and similarly 〈An, QAn〉 = 〈An, PQAnP 〉. Thus each QAn can be chosen

to satisfy QAn = PQAnP , and, by Lemma 2.3.9, its eigenvectors corresponding to non-zero

eigenvalues are contained in U = ran(P ).

For fixed A ∈ A0, we first show that the set QAn : n ∈ N is bounded. Write

QAn =∑k

i=1 λ(n)i v

(n)i v

(n)∗i , where v(n)

i : i ∈ [k] ⊆ ran(P ) is an orthonormal set and

λ(n)i > 0, i ∈ [k], where k = rank(P ). By Lemma 2.3.7, there exists r > 0 such that

rP ∈ A and hence by hereditarity rv(n)i v

(n)∗i ∈ A for all i ∈ [k] and n ∈ N. It follows from

(2.13) that⟨rv

(n)i v

(n)∗i , QAn

⟩≤ 1, and so

λ(n)i ≤ 1

rfor all i ∈ [k] and n ∈ N. (2.14)

Then note from (2.13) that for all n ∈ N,

(1 + 1/n) 〈A,QAn〉 > 1,

and hence thatk∑i=1

λ(n)i

⟨A, v

(n)i v

(n)∗i

⟩= 〈A,QAn〉 >

1

2for all n ∈ N. (2.15)

Now A ∈MPd , so there exists t > 0 such that A ≥ tP and so

t ≤⟨A, v

(n)i v

(n)∗i

⟩≤ ‖A‖ for all i ∈ [k], n ∈ N.


Now (2.15) and (2.14) give

λ(n)j

⟨A, v

(n)j v

(n)∗j

⟩>

1

2−∑i 6=j

λ(n)i

⟨A, v

(n)i v

(n)∗i

⟩≥ 1

2− d− 1

r‖A‖ > −d− 1

r‖A‖.

So for λ(n)j < 0, it certainly follows that

λ(n)j > −d− 1

rt‖A‖ for all j ∈ [k], n ∈ N. (2.16)

By (2.14) and (2.16), λ(n)i : i ∈ [k], n ∈ N is bounded, and so the set QAn : n ∈ N is

bounded as claimed. Thus, the sequence (QAn)n∈N has a subsequence (QAnj )j∈N convergent

to some RA ∈ Mhd . Then PQAnjP = QAnj → RA, but by continuity PQAnjP → PRAP ;

this gives RA = PRAP .

By (2.13) it holds for all M ∈ A and n ∈ N that 〈M,QAn〉 ≤ 1, and hence

〈M,RA〉 ≤ 1 for all M ∈ A. (2.17)

From (2.13) we also have that

⟨(1 +

1

n

)A,QAn

⟩> 1, n ∈ N,

and since A ∈ A, by letting n→∞ and using (2.17) we conclude that

〈A,RA〉 = 1. (2.18)

We now claim that RA ≥ 0 for all A ∈ A0. Recalling RA = PRAP and using Lemma 2.3.9,

suppose towards a contradiction that there exists A ∈ A0 and unit vector v ∈ U = ran(P )

such that RAv = λv with λ < 0. Since A ∈MPd , there exists t > 0 with A ≥ tP ≥ tvv∗, and

so

0 ≤ A− tvv∗ ≤ A ∈ A,

giving that A− tvv∗ ∈ A by hereditarity. However,

〈A− tvv∗, RA〉 = 1− λt > 1,

contradicting (2.17), and showing that RA ≥ 0.

Now write C =⋂A∈A0

ARA ∩ NP⊥ . The proposition is proved by showing that C = A.


Note by (2.17) for all A ∈ A0 that A ⊆ ARA . By Lemma 2.3.10, A ⊆ NP⊥ . Thus it is clear

that A ⊆ C. Fix M /∈ A. The assertion C = A will follow by showing that M /∈ C. We

identify four cases.

Case 1. M /∈M+d .

Since C ⊆M+d , we have M /∈ C.

Case 2. M ∈MPd \A.

Recall that rP ∈ A for some r > 0. Let m = maxk ∈ R+ : kM ∈ A. By (2.12) on page 52

and Lemma 2.3.7, it is clear that 0 < m < 1, and setting M ′ = mM we have M ′ ∈ A0. Then

C ⊆ ARM′ and by (2.18), 〈M ′, RM ′〉 = 1. Then 〈M,RM ′〉 = 1/m > 1, and so M /∈ C.

Case 3. M = PMP ∈M+d \(M

Pd ∪ A).

Since the sets ARA and NP⊥ are convex, C is convex. From Case 2 and using A ⊆ C, it is

clear that

MPd ∩ A = MP

d ∩ C. (2.19)

Suppose towards a contradiction that M ∈ C and take N ∈ C ∩MPd . Letting Mn = (1 −

1/n)M + (1/n)N , the convexity of C gives that Mn ∈ C for all n ∈ N. Since M = PMP ≥ 0

and N ∈ MPd , it is also clear that Mn ∈ MP

d for all n ∈ N, and so by (2.19), Mn ∈ A for all

n ∈ N. However, Mn →n→∞ M , and since A is closed, M ∈ A, the required contradiction.

Case 4. M ∈M+d , and PMP 6= M.

By Lemma 2.3.10 we have M /∈ NP⊥ , and hence M /∈ C.

We can now give the ‘second anti-blocker theorem’ for Md.

Theorem 2.3.12. A non-empty set A ⊆M+d is reflexive if and only if A is a convex corner.

Proof. If A is a convex corner, its reflexivity is an immediate consequence of Proposition

2.3.11, Lemmas 2.2.24 and 2.2.37 and Corollary 2.3.3. The converse implication follows from

Lemma 2.2.10 (i).

2.3.2 Consequences of reflexivity

Here we give some results which follow from the reflexivity of Md-convex corners. Proposition

2.3.16 answers a question raised in Remark 2.2.23. Definition 2.3.17 and Proposition 2.3.18

are the natural non-commutative analogues of our work in the Rd case.


Lemma 2.3.13. If A and B are Md-convex corners, then

A ⊆ B ⇐⇒ A] ⊇ B],

A = B ⇐⇒ A] = B],

A ( B ⇐⇒ A] ) B].

Proof. These equivalences are trivial consequences of Lemma 2.2.10 (ii) and Theorem 2.3.12.

We recall that if S ⊆M+d , then C(S) denotes the convex corner generated by S.

Proposition 2.3.14. If A ⊆M+d , then A]] = C(A).

Proof. By Lemma 2.2.32, A] = (C(A))], and anti-blocking both sides yields A]] = (C(A))]] =

C(A), using Theorem 2.3.12.

Corollary 2.3.15. If A is a diagonal Md-convex corner, then her(A) is the Md-convex corner

given by her(A) = C(A) = A]].

Proof. If A is a diagonal Md-convex corner, then conv(A) = A. Then C(A) = her(A), and

the result is immediate from Proposition 2.3.14.

Proposition 2.3.16. Let Bα be a convex corner for each α ∈ A. Then

(⋂α∈ABα

)]= C

(⋃α∈AB]α

).

Proof. Using Theorem 2.3.12, Lemma 2.2.22 and Proposition 2.3.14, we have that

(⋂α∈ABα

)]=

(⋂α∈AB]]α

)]=

(⋃α∈AB]α

)]]= C

(⋃α∈AB]α

).

By analogy with the corresponding definitions (1.8), (1.12) and (1.15) on pages 6, 9 and

11 respectively for Rd-convex corners, we make the following definitions.

Definition 2.3.17. Let A be an Md-convex corner.


(i) If A is bounded we define

γ(A) = maxTrA : A ∈ A. (2.20)

The maximum exists because a bounded convex corner is compact and the trace functional

is continuous. If A is unbounded, noting Lemma 2.1.3, we define γ(A) = ∞. Note that

γ(A) = 0 if and only if A = 0.

(ii) If A 6= M+d , we define

N(A) = maxβ ∈ R+ : βI ∈ A. (2.21)

The maximum exists because A is closed. We define N(M+d ) = ∞. Note by Lemma 2.2.12

that N(A) = 0 if and only if A has empty interior relative to M+d .

(iii) If A has non-empty interior relative to M+d we define

M(A) = inf

k∑i=1

λi : ∃k ∈ N, Ai ∈ A, λi > 0 s.t.

k∑i=1

λiAi ≥ I

. (2.22)

(Note that if A has a non-empty interior relative to M+d then the set on the right is non-empty

and the infimum exists.) When A is a standard convex corner, the same argument as given

in Lemma 1.2.19 shows that the infimum is attained. Note that M(M+d ) = 0.

If A has empty interior relative to M+d , the condition on the right of (2.22) cannot be satisfied

and we define M(A) =∞.

The following result is analogous to Proposition 1.2.20. We follow the same convention

that 10 =∞, 1

∞ = 0.

Proposition 2.3.18. If A is an Md-convex corner, then

M(A) =1

N(A)= γ(A]).

Proof. The proof is an obvious adaptation of that of Proposition 1.2.20. Just note that in

the case A has empty interior relative to M+d , we have M(A) = ∞ by Definition 2.3.17,

N(A) = 0 by Lemma 2.2.12 and γ(A]) =∞ by Proposition 2.2.16.

2.4 Entropic quantities in quantum information theory 58

2.4 Entropic quantities in quantum information theory

We refer to the theory outlined in Chapter 1 of a source emitting a sequence of letters

from some fixed, finite alphabet as the classical case, to be contrasted to the quantum case

which we now begin to outline, and which is considered throughout Chapters 3 and 4. The

discovery of quantum mechanics has revolutionised our understanding of the physical world

and simultaneously inspired many new developments in mathematics. Not least to witness

this quantum revolution has been information theory. Section 2.4.1 summarises some basic

concepts in the resulting field, known as quantum information theory. In particular, the

concept of entropy in this quantum setting will be discussed, which will help explain the

motivation for what follows. Section 2.4.2 includes a discussion of the diagonal expectation

operator, and proves a number of related technical lemmas which will be needed later. Finally,

in Section 2.4.3 we introduce the concept of entropy over an Md-convex corner, and show

that the well-known von Neumann entropy is a special case.

2.4.1 Some background on quantum information

We use terminology and conventions as in [35]. Every isolated quantum system has an as-

sociated Hilbert space H, known as the state space. Only finite dimensional Hilbert spaces

will be considered here, and so without loss of generality we can set H = Cd for some d ∈ N.

If ψ ∈ H is a unit vector, then ψ is known as a pure state and represents a possible con-

figuration of the system. When analysing the discrete, memoryless and stationary source in

classical information theory, we considered a sequence of independent, identically distributed

random variables following a certain probability distribution over a fixed and finite alphabet

1, . . . , d. A message is a sequence of letters, or equivalently a sequence of canonical basis

vectors of Rd. In the quantum case a message is formed from a sequence of state vectors in

Cd. Rather than a probability distribution p ∈ Pd, it is now appropriate to consider an en-

semble or mixed state ψi, pi : i ∈ [n], where each ψi ∈ Cd is a pure state and (pi)i∈[n] ∈ Pn.

We represent such an ensemble by the density matrix or state

ρ =

n∑i=1

piψiψ∗i ∈M+

d . (2.23)

If n > 1 in (2.23), we refer to ρ as a mixed state. The density matrix corresponding to a

pure state ψ is the rank one projection ψψ∗, which is also conventionally referred to as a pure

state. Context will usually resolve any potential ambiguity in terminology. It is not hard to


see that the following characterisation holds.

Proposition 2.4.1. [35, Proposition 3.16]. An element ρ ∈Md is a density matrix of some

ensemble if and only if ρ ≥ 0 and Tr ρ = 1.

We let Rd denote the set of all states in Md, that is,

Rd = M ∈M+d : TrM = 1.

It is immediate that Rd is convex. It is well known that the pure states in Rd are the

extreme points of Rd, but we include a proof for the benefit of the reader.

Proposition 2.4.2. If ρ = vv∗ ∈ Rd for some unit vector v ∈ Cd, and ρ =∑n

i=1 λiuiu∗i with

λi > 0 and unit vectors ui ∈ Cd, then uiu∗i = vv∗ for all i = 1, . . . , n.

Proof. First note that we have ρ2 = ρ and so Tr(ρ2) = 1. The Cauchy–Schwarz inequality

gives that | 〈ui, uj〉 | ≤ 1 for all i, j ∈ [n] with equality if and only if uiu∗i = uju

∗j . Since∑n

i=1 λi = Tr ρ = 1, we then have

Tr(ρ2) = Tr

( n∑i=1

λiuiu∗i

)2 =

∑i,j∈[n]

λiλj | 〈ui, uj〉 |2 ≤∑i,j∈[n]

λiλj = 1,

with equality if and only if uiu∗i = vv∗ for all i ∈ [n].

The following well-known corollary follows and gives a number of characterisations of pure

states.

Corollary 2.4.3. [54, p.64] The following are equivalent:

(i) ρ ∈ Rd is pure;

(ii) ρ is an extreme point of Rd;

(iii) ρ ∈ Rd satisfies Tr(ρ2) = 1.

It is easy to see that any mixed state ρ ∈ Rd is in the convex hull of the pure states in

Rd by writing ρ =∑d

i=1 λiviv∗i where v1, . . . , vd is an orthonormal basis of eigenvectors of

ρ with corresponding eigenvalues λ1, . . . , λd.

As in the classical case, entropic quantities involve the logarithm function, but in the

quantum case the logarithm function will act not on positive real numbers, but on positive


definite matrices. For A =∑d

i=1 aiuiu∗i ∈ M

++d with u1, . . . , ud an orthonormal basis of

Cd and each ai > 0, we recall that

logA =d∑i=1

(log ai)uiu∗i . (2.24)

When A,B ∈M++d commute, they are simultaneously unitarily diagonalisable and (2.24)

gives

log(AB) = logA+ logB (2.25)

If A =∑aiuiu

∗i ∈M

++d , we have A−1 =

∑a−1i uiu

∗i ∈M

++d and (2.24) gives

log(A−1) = − logA. (2.26)

It also holds that if logA = logB, then A = B.

We have already seen the importance of the trace functional, and for reference we give the

following straightforward observations. Consider A =∑d

i=1 aiuiu∗i ∈ M

++d as above. From

(2.24) and Lemma B.0.2 (viii) it is clear that

Tr(logA) =d∑i=1

log ai and Tr(ρ logA) =d∑i=1

〈ρui, ui〉 log ai. (2.27)

Working in the extended reals with the conventions 0 log 0 = 0 and log 0 = −∞, we define

Tr(logA) and Tr(ρ logA) for all A ∈ M+d by extending (2.27) to apply to all A ∈ M+

d . We

note then that Tr(ρ logA) is finite if and only if ker(A) ⊆ ker(ρ).

The simplest classical entropic quantity which we have considered is the Shannon entropy.

Its analogue in the quantum setting is the von Neumann entropy (see, for example [54,

Definition 5.17]).

Definition 2.4.4. The von Neumann entropy of a state ρ ∈ Rd is given by

H(ρ) = −Tr(ρ log ρ).

For ρ ∈ Rd, (2.27) shows easily that H(ρ) is the Shannon entropy of the probability

distribution whose elements are the eigenvalues of ρ. Comparing to (1.3) on page 3, it is easy

to see that 0 ≤ H(ρ) ≤ log d for ρ ∈ Rd, and that H(ρ) = 0 if and only if ρ is a pure state.

Another important and well-known concept is quantum relative entropy. Note that in the


definition below, ρ and σ are arbitrary elements of M+d , and not restricted to be states.

Definition 2.4.5. [45, Section 1.2]. Given ρ, σ ∈ M+d satisfying ker(σ) ⊆ ker(ρ), we define

D(ρ‖σ), the quantum relative entropy of ρ with respect to σ, by

D(ρ‖σ) = Tr(ρ log ρ− ρ log σ).

When ker(σ) 6⊆ ker(ρ), we define D(ρ‖σ) =∞.

We now recall some basic properties of D(ρ‖σ).

Lemma 2.4.6. [57, Theorem 11.8.1], [56, p.250]. For ρ, σ ∈ Rd, the quantum relative

entropy satisfies D(ρ‖σ) ≥ 0 and vanishes if and only if ρ = σ.

The next lemma states that the quantum relative entropy D(ρ‖σ) is jointly convex in

ρ ∈M+d and σ ∈M+

d .

Lemma 2.4.7. [45, Theorem 7]. If ρ =∑

k λkρ(k) ∈ M+

d and σ =∑

k λkσ(k) ∈ M+

d satisfy

ker(σ(k)) ⊆ ker(ρ(k)) with λk > 0 and∑

k λk = 1, then

D(ρ‖σ) ≤∑k

λkD(ρ(k)‖σ(k)).

When ρ(k), σ(k) ∈ M++d , equality holds if and only if log ρ− log σ = log ρ(k) − log σ(k) for all

k.

The quantum relative entropy D(ρ‖σ) is known to satisfy the following continuity condi-

tion.

Lemma 2.4.8. ([56, p.251], see also [3].) For n ∈ N and states ρn → ρ ∈ M+d and

σn → σ ∈M+d , we have

limn→∞

D(ρn‖σn) = D(ρ‖σ).

It is trivial to prove the following corollary that will be required later.

Corollary 2.4.9. At fixed ρ ∈ Rd, the quantum relative entropy D(ρ‖σ) is convex and lower

semi-continuous in σ ∈M+d .

Proof. Convexity in σ follows from Lemma 2.4.7 by putting each ρ(k) = ρ. We merely need

to examine the case ker(σ(k)) 6⊆ ker(ρ(k)) for some k. Here D(ρ(k)‖σ(k)) = ∞ and convexity

trivially holds.


For ρ, σ ∈ Rd, Lemma 2.4.8 gives that D(ρ‖σ) is continuous. Continuity, and then

obviously lower semi-continuity, in the case of arbitrary σ ∈ M+d follows easily. Indeed, let

σ = kτ with k ∈ R+ and τ ∈ Rd. By (2.27), Tr(ρ log(kτ)) = Tr(ρ log τ) + log k and so

D(ρ‖σ) = D(ρ‖τ) − log k. Now let σn = knτn → σ as n → ∞ with τn ∈ Rd and kn ∈ R+.

Then kn = Trσn → Trσ = k and by Lemma 2.4.8,

limn→∞

D(ρ‖σn) = limn→∞

D(ρ‖τn)− limn→∞

log kn = D(ρ‖τ)− log k = D(ρ‖σ),

as required.

2.4.2 Diagonal expectation

In this section the role of the diagonal expectation operator will be discussed and used to

explore further the relationship between Md-convex corners and diagonal Md-convex corners.

The motivation for some of the results given in this section may not yet be clear, but will

hopefully become so in the next, where, for instance, we use the theory described here to

generalise results in [11] concerning entropy over Rd-convex corners to the non-commutative

setting. We will also use diagonal expectations to prove Lemma 2.4.35, which shows how the

concept of entropy over an Rd-convex corner is in fact embedded in a more general concept

of entropy over an Md-convex corner, a notion which we introduce in Section 2.4.3. The

properties of the diagonal expectation operator will also be used to prove Proposition 4.1.11,

an important result linking parameters of graphs with parameters of certain sets in M+d which

we will call non-commutative graphs, and which we will regard as generalisations of graphs.

For any orthonormal basis V = v1, . . . , vd of Cd we let

DV = spanviv∗i : i ∈ [d]

denote the algebra of matrices diagonal with respect to V . We write D+V = DV ∩M+

d . For

A ∈Md we define ∆V (A), the diagonal expectation of A with respect to basis V , by

∆V (A) =d∑i=1

〈Avi, vi〉 viv∗i .

Expressing A ∈Md in basis V as A =∑

i,j∈[d] aijviv∗j for aij ∈ C, we observe that ∆V (A) =∑d

i=1 aiiviv∗i ∈ DV , showing that the action of ∆V is to retain only the diagonal elements of

A with respect to basis V . As previously, we denote by Dd the algebra of matrices in Md


diagonal in the canonical basis e1, . . . , ed, and we write ∆(A) for the diagonal expectation

of A with respect to this canonical basis. We begin with some elementary properties.

Lemma 2.4.10. For any orthonormal basis V it holds that

(i) M ∈ DV ⇐⇒ ∆V (M) = M ;

(ii) M ∈Md ⇒ ∆V (M) ∈ DV ;

(iii) A ≥ 0⇒ ∆V (A) ≥ 0;

(iv) A,B ∈ DV ⇒ AB = BA;

(v) For A ⊆Md it holds that DV ∩ A = ∆V (A) if and only if ∆V (A) ⊆ A;

(vi) Tr(∆V (A)) = TrA;

(vii) M ≤ I ⇒ ∆V (M) ≤ I.

Proof. The first four statements are obvious.

For (v) note that DV ∩A ⊆ A, so if DV ∩A = ∆V (A), then ∆V (A) ⊆ A. Conversely, we have

∆V (A) ⊆ DV , so if ∆V (A) ⊆ A, then ∆V (A) ⊆ DV ∩A. But it is clear that DV ∩A ⊆ ∆V (A),

completing the proof.

For (vi), set A =∑

i,j∈[d] aijviv∗j for aij ∈ C, giving ∆V (A) =

∑di=1 aiiviv

∗i ∈ DV . Then

Tr(∆V (A)) =∑d

i=1 aii = TrA, where we have used Lemma B.0.2 (viii).

To prove (vii), suppose that M ≤ I, that is I − M ≥ 0. Applying (iii) yields ∆VM ≤

∆V (I) = I as required.

Remark 2.4.11. For an Md-convex corner A, Lemma 2.2.3 showed that Dd ∩A is a diagonal

Md-convex corner. We note, however, that ∆(A) is not necessarily a diagonal Md-convex

corner for an Md-convex corner A. For example, consider the Md-convex corner BP as given

in (2.5), where we set P = 1d11

∗. Since P is a rank one projection, we have

BP = M ∈M+d : M ≤ P = αP : 0 ≤ α ≤ 1,

and

∆(BP ) =αdId : 0 ≤ α ≤ 1

.

Now 1de1e

∗1 ≤ 1

dId ∈ ∆(BP ), but 1de1e

∗1 /∈ ∆(BP ). It is thus clear that ∆(BP ) lacks hereditarity

over D+d as discussed in Remark 2.2.4, and is not an Md-convex corner. (Note in this case

that Dd ∩ BP = 0, which is trivially a diagonal Md-convex corner.)


The following two lemmas will be widely used.

Lemma 2.4.12. Let M,N ∈Md and V = v1, . . . , vd be an orthonormal basis of Cd. Then

Tr((∆V (M))N) = Tr(M∆V (N)).

Proof. Using the cyclicality of trace we write

Tr((∆V (M))N) = Tr

(d∑i=1

〈Mvi, vi〉 viv∗iN

)

=d∑i=1

〈Mvi, vi〉〈Nvi, vi〉

= Tr

(d∑i=1

Mviv∗i 〈Nvi, vi〉

)= Tr (M∆V (N)) ,

as required.

Lemma 2.4.13. Let V be an orthonormal basis. If B ⊆M+d satisfies DV ∩B = ∆V (B), then

DV ∩ (∆V (B))] = DV ∩ B] = ∆V (B]). (2.28)

Where B is a convex corner, DV ∩ B = ∆V (B) if and only if DV ∩ B] = ∆V (B]).

Proof. Write A = ∆V (B), and suppose that DV ∩B = ∆V (B) giving A ⊆ B by Lemma 2.4.10

(v). Then A] ⊇ B], and so DV ∩A] ⊇ DV ∩B]. For the reverse inclusion choose T ∈ DV ∩A],

and note that T = ∆V (T ) and Tr(TM) ≤ 1 for all M ∈ A. Let N ∈ B, and observe that

∆V (N) ∈ A. Applying Lemma 2.4.12 yields

Tr(TN) = Tr((∆V (T ))N) = Tr(T∆V (N)) ≤ 1,

and so we have T ∈ DV ∩ B]. Thus DV ∩ A] ⊆ DV ∩ B], and the first equality of (2.28) is

now proved.

To establish the second equality of (2.28), let M ∈ ∆V (B]), that is M = ∆V (R) for some

R ∈ B]. Let Q ∈ B and set P = ∆V (Q). By assumption, P ∈ B. We then have

Tr(MQ) = Tr((∆V (R))Q) = Tr(R∆V (Q)) = Tr(RP ) ≤ 1.

Then M ∈ B] and ∆V (B]) ⊆ B], and the second equality in (2.28) holds by Lemma 2.4.10

(v).


Finally, note that if B is a convex corner satisfying DV ∩B] = ∆V (B]), then the preceding

paragraph shows that DV ∩B]] = ∆V (B]]), and from Theorem 2.3.12, DV ∩B = ∆V (B).

Lemma 2.4.13 has the following immediate corollary, which helps clarify the connection

between diagonal Md-convex corners and Md-convex corners. (Recall from Lemma 2.2.3 and

Definition 2.2.6 that if B ⊆ M+d is a convex corner, then A = Dd ∩ B is a diagonal convex

corner, and we write A[ = Dd ∩ A].)

Corollary 2.4.14. Let B be an Md-convex corner, and let A be the diagonal Md-convex

corner given by A = Dd ∩ B. Then

A = Dd ∩ B = ∆(B) (2.29)

if and only if

A[ = Dd ∩ B] = ∆(B]). (2.30)

We now give a number of other straightforward but important results concerning Md-

convex corners and diagonal Md-convex corners.

Lemma 2.4.15. If B is an Md-convex corner such that ∆(B) is a diagonal Md-convex corner,

then

γ(∆(B)) = γ(B).

Proof. We have

γ(B) = maxTrT : T ∈ B = maxTr(∆(T )) : T ∈ B

= maxTrM : M ∈ ∆(B) = γ(∆(B)),

as required.

Lemmas 2.4.16 and 2.4.17 concern (A[)], which by Lemma 2.2.10 (i) is trivially an Md-

convex corner.

Lemma 2.4.16. If A is a diagonal Md-convex corner, then

(A[)] = M ∈M+d : ∆(M) ∈ A.

Proof. Let A ∈ A[, and let M ≥ 0 with ∆(M) ∈ A. Then 〈A,M〉 = 〈∆(A),M〉 =

〈A,∆(M)〉 ≤ 1 and so M ∈ (A[)], where we have used Lemma 2.4.12. Conversely, sup-


pose that M ≥ 0, but ∆(M) /∈ A. By Lemma 2.2.7, A = A[[, and so ∆(M) /∈ A[[ and there

exists B ∈ A[ such that

1 < 〈∆(M), B〉 = 〈M,∆(B)〉 = 〈M,B〉

and so M /∈ (A[)].

The concept of the hereditary cover of a set A ⊆M+d has already been discussed. Corol-

lary 2.2.28 shows that if A is a bounded diagonal convex corner, then her(A) is a convex

corner. We now consider some related results.

Lemma 2.4.17. If A is a standard diagonal convex corner, then

her(A) ( (A[)] = (her(A[))].

Proof. That (A[)] = (her(A[))] is clear from Lemma 2.2.32. If A is a diagonal Md-convex

corner, then A[ ⊆ A], and Corollary 2.3.15 and Lemma 2.2.10 (ii) give that

A]] = her(A) ⊆ (A[)]. (2.31)

We further claim that

her(A) ( (her(A[))]. (2.32)

Since A is a standard diagonal convex corner, rI ∈ A for some r > 0, and the method used

in Proposition 2.2.16 gives that A[ is bounded. Then Remark 2.2.11 and Corollary 2.2.28

give that her(A) and her(A[) are Md-convex corners. Lemma 2.3.13 then shows that (2.32)

is equivalent to (her(A))] ) her(A[). By Lemma 2.2.32, (her(A))] = A], so to prove this

claim we seek some M ∈ A] such that M /∈ her(A[). We set

m = N(A[) = maxµ : µI ∈ A[. (2.33)

Since A is bounded, the method of Proposition 2.2.17 gives that sI ∈ A[ for some s > 0.

Using that A[ is also bounded gives 0 < m <∞.

Observe that if M ∈Md and A ∈ A ⊆ Dd, then Tr(MA) = Tr(M∆(A)) = Tr((∆(M))A)

by Lemma 2.4.12. Thus if ∆(M) ∈ A] and M ≥ 0, then M ∈ A]. Since mI ∈ A], by this

observation we have mJ ∈ A], where J is the all ones matrix. The proof is completed by

showing that mJ /∈ her(A[), that is, there does not exist N ∈ A[ satisfying mJ ≤ N. To


show this, suppose to the contrary that

mJ ≤ N =

µ1 0 . . . 0

0 µ2 . . . 0...

.... . .

...

0 0 . . . µd

∈ A[.

Let Q = (qij) = N −mJ , so that

Q =

µ1 −m −m . . . −m

−m µ2 −m . . . −m...

.... . .

...

−m −m . . . µd −m

≥ 0.

This immediately requires µi ≥ m for all i = 1, . . . , d. But if µi = m, then qii = 0 and, since

Q ≥ 0, Corollary 2.1.2 gives qij = qji = −m = 0 for all j 6= i, contradicting that m > 0. So

we must have µi > m for all i = 1, . . . , d, and there exists ε > 0 such that µi ≥ m+ ε for all

i ∈ [d]. Then (m+ ε)I ≤ N and by hereditarity (m+ ε)I ∈ A[, contradicting (2.33). We can

conclude that mJ /∈ her(A[) as required.

Lemma 2.4.18. Let A be a diagonal Md-convex corner and B = her(A). Then A = ∆(B) =

Dd ∩ B.

Proof. It is trivial that A ⊆ B, and so A ⊆ Dd ∩ B ⊆ ∆(B). For the reverse inclusions, it

suffices to show that if T ∈ B then ∆(T ) ∈ A. Let N ∈ A be such that 0 ≤ T ≤ N . Then

N − T ≥ 0 and ∆(N − T ) ≥ 0. Thus 0 ≤ ∆(T ) ≤ ∆(N) = N ∈ A. It follows that ∆(T ) ∈ A

by the hereditarity of A over Dd.

Theorem 2.4.19. Let A be a diagonal Md-convex corner and B be an Md-convex corner.

Then A = ∆(B) = Dd ∩ B if and only if her(A) ⊆ B ⊆ (A[)].

Proof. Letting B1 = her(A), we have A = ∆(B1) = Dd ∩B1 by Lemma 2.4.18. Using Lemma

2.2.32, we write

B2 = (A[)] = (her(A[))]. (2.34)

Since A is a diagonal Md-convex corner, by Definition 2.2.2 we have A = φ(C) where C is an

Rd-convex corner and φ is as given in (2.1). Lemma 2.2.7 gives A[ = φ(C[), and since C[ is an

Rd-convex corner, Definition 2.2.2 gives that A[ is a diagonal Md-convex corner. By Corollary


2.3.15 it then holds that her(A[) is an Md-convex corner. Theorem 2.3.12 then gives that

her(A[) is reflexive, and we anti-block (2.34) to yield B]2 = her(A[). By Lemma 2.4.18 we

obtain A[ = ∆(B]2) = Dd∩(B]2), and by Corollary 2.4.14 it follows that A = ∆(B2) = Dd∩B2.

We conclude that when convex corner B satisfies B1 ⊆ B ⊆ B2, we have A = ∆(B) = Dd ∩B.

For the reverse implication, observe that if A = Dd ∩ B then A ⊆ B, and so her(A) ⊆ B

by the hereditarity of B. Now if A = Dd ∩ B = ∆(B), it follows by Corollary 2.4.14 that

A[ = Dd∩B]. ThenA[ ⊆ B] and hence her(A[) ⊆ B] by the hereditarity of B]. Theorem 2.3.12

and Lemmas 2.2.10 (ii) and 2.2.32 then give B = B]] ⊆ (her(A[))] = (A[)] as required.

The condition ∆V (A) ⊆ A, or equivalently, ∆V (A) = DV ∩A, has appeared in a number

of contexts. It is instructive to give examples of some convex corners satisfying this important

condition.

Lemma 2.4.20. The convex corner AN,k = T ∈ M+d : 〈T,N〉 ≤ k with N ∈ D+

V satisfies

∆V (AN,k) ⊆ AN,k.

Proof. For A ∈ AN,k we have A ≥ 0 and so ∆V (A) ≥ 0. Then for A ∈ AN,k and N ∈ D+V ,

using Lemma 2.4.12 we have that 〈∆V (A), N〉 = 〈A,∆V (N)〉 = 〈A,N〉 ≤ k. We conclude

∆V (AN,k) ⊆ AN,k.

Remark 2.4.21. If the condition N ∈ D+V is dropped, it is easy to find a counter example.

For example, in the case that N =

1 −1

−1 1

and k = 1 we have J =

1 1

1 1

∈ AN,1 but

∆(J) = I /∈ AN,1.

Lemma 2.4.22. The convex corner BM = T ∈ M+d : T ≤ M with M ∈ D+

V satisfies

∆V (BM ) ⊆ BM .

Proof. For B ∈ BM we have B ≥ 0 and so ∆V (B) ≥ 0. Furthermore, if B ∈ BM then

M−B ≥ 0 giving ∆V (M−B) ≥ 0. Then M−∆V (B) ≥ 0, and we conclude ∆V (B) ∈ BM .

Remark 2.4.23. Lemma 2.4.22 does not extend to all M ≥ 0 and orthonormal bases V . For

an obvious counter example, note that J ∈ BJ , but ∆(J) = I /∈ BJ .

Proposition 2.4.24. If A is a convex corner, then ∆V (A) ⊆ A if and only if ∆V (A]) ⊆ A].

Proof. First we assume ∆V (A) ⊆ A. Let B ∈ A], that is B ≥ 0 and 〈B,A〉 ≤ 1 for all A ∈ A.

Thus ∆V (B) ≥ 0 and 〈∆V (B), A〉 = 〈B,∆V (A)〉 ≤ 1 for all A ∈ A because ∆V (A) ∈ A. This


shows that ∆V (B) ∈ A], and hence ∆V (A]) ⊆ A]. The reverse direction follows by Theorem

2.3.12.

2.4.3 Entropy over an Md-convex corner

Entropic quantities are fundamental in quantum information theory as well as in classical

information. In this section we define HA(ρ), the entropy of a state ρ ∈ Rd over an Md-convex

corner A. This is a new concept in the quantum setting, and we proceed by analogy with the

Rd case introduced first in [11] and discussed in the previous chapter. Two of the simplest,

but also most fundamental, Md-convex corners are the ‘Md-unit corner’ and the ‘Md-unit

cube’. We use the notation of (2.3) on page 44 and (2.5) on page 44 to write the Md-unit

corner as AId = T ∈ M+d : TrT ≤ 1, and the Md-unit cube as BId = T ∈ M+

d : T ≤ I.

Recall from Corollaries 2.2.33 and 2.2.34 that A]Id = BId and B]Id = AId .

The following lemma is the non-commutative version of Lemma 1.2.3.

Lemma 2.4.25. For a state ρ ∈ Rd and a bounded Md-convex corner A, the function f :

A → R∪∞ given by f(A) = −Tr(ρ logA) attains a minimum value f(A0) for some A0 ∈ A.

If ρ > 0 and A has non-empty interior relative to M+d , then A0 is uniquely determined.

Proof. For A ∈ A we write A =∑d

i=1 aiviv∗i for an orthonormal set v1, . . . , vd and ai ∈ R+.

Let

A0(ρ) =

A =

d∑i=1

aiviv∗i ∈ A : 〈vi, ρvi〉 > 0⇒ ai > 0

.

For A =∑d

i=1 aiviv∗i ∈ A0(ρ) observe using (2.27) that

f(A) = −d∑i=1

〈vi, ρvi〉 log ai <∞,

and the restriction of f to A0(ρ) is continuous. However, f(A) =∞ for all A ∈ A\A0(ρ). It

is also clear that limB→A f(B) = ∞ for all A ∈ A\A0(ρ), and so f is lower semi-continuous

on A. Since A is bounded and closed, it is compact, and the first assertion follows from

Theorem A.0.6.

For fixed ρ ∈ Rd, using Definition 2.4.5, we have f(A) = D(ρ‖A)−Tr(ρ log ρ), and Lemma

2.4.7 gives

f

(A0 +B0

2

)≤ 1

2f(A0) +

1

2f(B0) for A0, B0 ∈ A, (2.35)


and states that when ρ,A0, B0 ∈M++d , equality holds in (2.35) if and only if logA0 = logB0,

a condition equivalent to A0 = B0. Now suppose ρ > 0 and that A has non-empty interior

relative to M+d . This gives that A0(ρ) = A ∩M++

d 6= ∅ by Lemma 2.2.12. Now, assume

towards a contradiction that there exist distinct A0, B0 ∈ A satisfying f(A0) = f(B0) =

minA∈A f(A). Then A0, B0 ∈ A0(ρ) ∩M++d and in (2.35) we have f(A0+B0

2 ) < f(A0), where

A0+B02 ∈ A by the convexity of A, contradicting that f(A0) = minA∈A f(A). We conclude

that when A has non-empty interior relative to M+d and ρ > 0, the minimum is achieved for

a unique A0 ∈ A.

The fact that this minimum exists motivates the following definition, analogous to Defi-

nition 1.2.4.

Definition 2.4.26. Let A ⊆ M+d be a bounded convex corner and ρ ∈ Rd be a state. The

entropy of ρ over A is given by

HA(ρ) = min−Tr(ρ logA) : A ∈ A.

Remark 2.4.27. If Md-convex corner A has empty interior relative to M+d , then by Lemma

2.2.12, A has no strictly positive element, and there exists ρ ∈ Rd, for example the maximally

mixed state 1dI, such that A0(ρ) = ∅. Then −Tr(ρ logA) =∞ for all A ∈ A, and HA(ρ) =∞.

Conversely, if A has non-empty interior relative to M+d , then for all states ρ ∈ Rd it holds

that A0(ρ) 6= ∅ and HA(ρ) is finite. Theorem 2.4.31 will examine the latter case and give an

upper bound on HA(ρ).

Lemma 2.4.28. If convex corners A,B satisfy A ⊆ B, then HA(ρ) ≥ HB(ρ) for all ρ ∈ Rd.

Proof. This is immediate from Definition 2.4.26.

Elements of AId have eigenvalues in the interval [0, 1], so for all ρ ∈ Rd, it is not hard to

see that a minimising element of AId in the definition of HAId (ρ) can be chosen to have unit

trace, and

0 ≤ HAId (ρ) = min−Tr(ρ log σ) : σ ∈ Rd.

Now from Lemma 2.4.6 we recall for all ρ, σ ∈ Rd thatD(ρ‖σ) = Tr(ρ log ρ−ρ log σ) ≥ 0. Thus

for given ρ ∈ Rd it holds that −Tr(ρ log σ) ≥ H(ρ) for all σ ∈ Rd, but since ρ ∈ Rd ⊆ AId ,

we have HAId (ρ) ≤ −Tr(ρ log ρ). We conclude that

HAId (ρ) = H(ρ) for all ρ ∈ Rd, (2.36)


in other words, von-Neumann entropy is an example of entropy over a convex corner.

Elements of BId also have eigenvalues in the interval [0, 1], and thus −Tr(ρ logA) ≥ 0 for

all ρ ∈ Rd and A ∈ BId . However, since I ∈ BId and Tr(ρ log I) = 0,

HBId (ρ) = 0 for all ρ ∈ Rd. (2.37)

For an Md-convex corner A satisfying AId ⊆ A ⊆ BId , Lemma 2.4.28 now gives

0 ≤ HA(ρ) ≤ H(ρ) for all ρ ∈ Rd. (2.38)

In the commutative case [11, Section 2] notes that there exist Rd-convex corners B, C such

that HB(p) < 0 and HC(p) > H(p) for all p ∈ Pd. Similarly, there exist Md-convex corners

B, C satisfying HB(ρ) < 0 and HC(ρ) > H(ρ) for all ρ ∈ Rd. For example, if B = kBId =

kT : T ∈ BId and k > 1, then

HB(ρ) = HBId (ρ)− log k = − log k < 0.

Similarly, if C = 1kAId and k > 1, then

HC(ρ) = HAId (ρ) + log k = H(ρ) + log k > H(ρ).

Concavity is an important characteristic of entropic quantities, and the following lemma

gives a straightforward proof that HA(ρ) has this property.

Lemma 2.4.29. For a bounded Md-convex corner A, the entropy function ρ → HA(ρ) is

concave.

Proof. Let ρ = λ1ρ1 + λ2ρ2 with ρi ∈ Rd, λi ∈ R+, i = 1, 2 and λ1 + λ2 = 1. For some

A0 ∈ A we have

HA(ρ) = Tr(−ρ logA0) =λ1 Tr(−ρ1 logA0) + λ2 Tr(−ρ2 logA0)

≥λ1 minA∈A

Tr(−ρ1 logA) + λ2 minA∈A

Tr(−ρ2 logA)

=λ1HA(ρ1) + λ2HA(ρ2),

thus establishing concavity.

The following continuity property of HA(ρ) is a non-commutative version of Lemma 1.2.11


and allows us to work towards a non-commutative analogue of Theorem 1.2.13.

Lemma 2.4.30. Let A be a standard Md-convex corner. Then the function f : Rd → R

given by f(ρ) = HA(ρ) is upper semi-continuous on Rd and attains a finite maximum value

on Rd.

Proof. Since A is bounded and has non-empty interior relative to M+d , Lemma 2.4.25 and

Remark 2.4.27 give that f(ρ) = HA(ρ) is finite for all ρ ∈ Rd. Let (ρ(n))n∈N be a sequence

in Rd converging to ρ ∈ Rd. For ρ ∈ Rd and B ∈M+d satisfying kerB ⊆ ker ρ, let g(ρ,B) =

−Tr(ρ logB). We denote by A ∈ A and A(n) ∈ A the minimising matrices in the definitions

of HA(ρ) and HA(ρ(n)) respectively, that is, HA(ρ) = g(ρ,A) and HA(ρ(n)) = g(ρ(n), A(n)).

Since A has non-empty interior relative to M+d , there exists r > 0 such that rI ∈ A, and we

form B = (1− µ)A+ µrI ∈M++d where µ ∈ (0, 1). By convexity, B ∈ A.

Let A =∑d

i=1 aiviv∗i where v1, . . . , vd is an orthonormal basis of Cd and ai ≥ 0. Then

g(ρ,A) = HA(ρ) = −Tr(ρ logA) = −∑d

i=1 ρii log ai, where ρii = 〈ρvi, vi〉 . Since HA(ρ) <∞,

it is clear that ρii = 0 for all i such that ai = 0. Note that each vi is also an eigenvector of

B; indeed, B =∑d

i=1 ((1− µ)ai + µr) viv∗i . We have

g(ρ,A) ≤ g(ρ,B) =− Tr (ρ log ((1− µ)A+ µrI))

=−d∑i=1

ρii log ((1− µ)ai + µr) .

Noting that ρii = 0 when ai = 0, we see that g(ρ,B)→ g(ρ,A) as µ→ 0. Then for any δ > 0,

there exists µ ∈ (0, 1) such that g(ρ,A) ≤ g(ρ,B) ≤ g(ρ,A) + δ. For all n ∈ N it holds that

HA(ρ(n)) = g(ρ(n), A(n)) ≤ g(ρ(n), B). Then because ρ(n) → ρ and B > 0, we have

lim supn→∞

HA(ρ(n)) ≤ lim supn→∞

g(ρ(n), B) = g(ρ,B).

Finally, since δ > 0 was arbitrary and g(ρ,B) ≤ g(ρ,A) + δ = HA(ρ) + δ, we have

lim supn→∞

f(ρ(n)) = lim supn→∞

HA(ρ(n)) ≤ HA(ρ) = f(ρ),

and f is upper semi-continuous as stated. Theorem A.0.6 and the compactness of Rd show

that a maximum value is attained.

Having established that a maximum is attained, the next proposition, analogous to The-

orem 1.2.13, determines its value. (Recall that the parameter N(A) was introduced in Defi-


nition 2.3.17.)

Theorem 2.4.31. For a standard Md-convex corner A it holds that

maxρ∈Rd

HA(ρ) = − logN(A).

Proof. Note that Rd and A are compact and convex subsets of finite dimensional spaces.

Consider the function f : Rd × A → R ∪ ∞ given by f(ρ,A) = −Tr(ρ logA). The

function ρ → f(ρ,A) is linear and hence concave for fixed A ∈ A. Note that f(ρ,A) =

D(ρ‖A) − Tr(ρ log ρ), and so from Corollary 2.4.9 we have that the function A → f(ρ,A)

is convex and lower semi-continuous for fixed ρ ∈ Rd. Thus all the conditions for applying

Theorem A.0.8 are met and by interchange of the supremum and infimum, we obtain

maxρ∈Rd

HA(ρ) = supρ∈Rd

infA∈A

f(ρ,A) = infA∈A

supρ∈Rd

f(ρ,A).

Fix A ∈ A and let λmin(A) denote the smallest eigenvalue of A and uA its corresponding unit

eigenvector. Working in R ∪ ∞ if necessary, (2.27) gives

supf(ρ,A) : ρ ∈ Rd = sup−Tr(ρ logA) : ρ ∈ Rd = − log λmin(A),

where the supremum is obtained by the pure state uAu∗A. Then

maxHA(ρ) : ρ ∈ Rd = inf− log λmin(A) : A ∈ A = − log (supλmin(A) : A ∈ A) .

(2.39)

Let m = supA∈A λmin(A). Recalling that N(A)I ∈ A, it is clear that m ≥ N(A).

Conversely, for every ε > 0, there exists A ∈ A such that m − ε < λmin(A) and hence

(m− ε)I ≤ A ∈ A. By hereditarity it follows that (m− ε)I ∈ A. Thus N(A) ≥ m− ε for all

ε > 0 and N(A) ≥ m. Thus m = N(A) and putting this in (2.39) completes the proof.

Corollary 2.4.32. For a standard Md-convex corner A it holds that

maxρ∈Rd

HA(ρ) = − logN(A) = logM(A) = log γ(A]).

Proof. This is immediate from Theorem 2.4.31 and Proposition 2.3.18.

Lemma 2.4.33. If an Md-convex corner B satisfies ∆(B) = Dd ∩ B, then

M(∆(B)) =1

N(∆(B))= γ(∆(B)[) = M(B) =

1

N(B)= γ(B]).


Proof. Since ∆(B) = Dd∩B, from Lemma 2.2.3 it is clear that ∆(B) is a diagonal Md-convex

corner. By Corollary 2.4.14, ∆(B)[ = ∆(B]) = Dd ∩ B]. We note that ∆(B]) is a diagonal

Md-convex corner, and so by Lemma 2.4.15 it follows that γ(∆(B)[) = γ(∆(B])) = γ(B]).

Propositions 2.2.9 and 2.3.18 complete the proof.

Corollary 2.4.34. Suppose the standard Md-convex corner B satisfies ∆(B) = Dd ∩ B.

Taking φ to be the canonical bijection R+d → D

+d as given in (2.1) on page 35, it holds that

maxp∈Pd

Hφ−1(∆(B))(p) = maxρ∈Rd

HB(ρ).

Proof. As in the proof of Lemma 2.4.33, ∆(B) is a diagonal Md-convex corner. Lemma 2.4.33

gives N(∆(B)) = N(B). Then simply note from Theorem 1.2.13 and Definition 2.2.8 that

maxp∈Pd Hφ−1(∆(B))(p) = − logN(∆(B)), and from Theorem 2.4.31 that maxρ∈Rd HB(ρ) =

− logN(B).

Corollary 2.4.34 relates the entropy over certain Md-convex corners and that over related

diagonal Md-convex corners; this is also the theme of the next lemma. Some ideas in its proof,

along with Definition 4.3.1, arose in earlier discussions between Joshua Lockhart, Giannicola

Scarpa, Ivan Todorov and Andreas Winter.

Lemma 2.4.35. Let A be a diagonal convex corner and let B be a bounded Md-convex corner

with A = ∆(B) = Dd ∩ B. If p ∈ Pd and ρ =∑d

i=1 pieie∗i , then

Hφ−1(A)(p) = HB(ρ).

Proof. For A ∈ A we have A =∑d

i=1 aieie∗i with ai ≥ 0, and Tr(ρ logA) =

∑di=1 pi log ai.

Now A ⊆ B, so

HB(ρ) = minB∈B

Tr(−ρ logB) ≤ minA∈A

Tr(−ρ logA) = Hφ−1(A)(p).

For the reverse inequality, let B =∑d

j=1 bjvjv∗j ∈ B with bj ≥ 0 and v1, . . . , vd an

orthonormal basis of Cd. We have

∆(B) =

d∑i=1

〈Bei, ei〉 eie∗i =d∑i=1

d∑j=1

bj | 〈vj , ei〉 |2 eie

∗i ,

−Tr(ρ logB) = −d∑i=1

d∑j=1

pi| 〈vj , ei〉 |2 log bj ,


and

−Tr(ρ log(∆(B))) =−d∑i=1

pi log

d∑j=1

bj | 〈vj , ei〉 |2

≤−d∑i=1

pi

d∑j=1

| 〈vj , ei〉 |2 log bj = −Tr(ρ logB),

where the inequality follows from the concavity of the logarithm function and the fact that∑dj=1 | 〈vj , ei〉 |2 = 1 for each i = 1, . . . , d. Noting that ∆(B) ∈ A gives Hφ−1(A)(p) ≤ HB(ρ),

and the proof is complete.

The following lemmas give straightforward but useful characterisations of entropy over

the convex corners BId and AId respectively.

Lemma 2.4.36. Let A be an Md-convex corner with AId ⊆ A ⊆ BId. The following are

equivalent:

(i) HA(ρ) = 0 for all states ρ;

(ii) γ(A]) = 1;

(iii) I ∈ A;

(iv) A = BId;

(v) γ(A) = d.

Proof. (i) ⇐⇒ (ii). This follows from Corollary 2.4.32 and (2.38) on page 71.

(ii) ⇐⇒ (iii). For A ⊆ BId , note that N(A) ≤ 1. The assertion then follows from Proposition

2.3.18.

(iii) ⇐⇒ (iv). From the hereditarity of A and the assumption A ⊆ BId , if I ∈ A, then

A = M ∈M+d : M ≤ I. The converse is trivial.

(iv) ⇐⇒ (v). This is immediate given that A ⊆ BId .

Lemma 2.4.37. Let A be an Md-convex corner with AId ⊆ A ⊆ BId. The following are

equivalent:

(i) HA(ρ) = H(ρ) for all states ρ;


(ii) γ(A]) = d;

(iii) A = AId;

(iv) γ(A) = 1.

Proof. (ii) ⇐⇒ (iii). Recall that A]Id = BId and B]Id = AId , yielding AId ⊆ A] ⊆ BId . It

thus holds that A = AId ⇐⇒ A] = BId ⇐⇒ γ(A]) = d by Lemma 2.4.36.

(iii) ⇐⇒ (iv). This follows immediately from the assumption AId ⊆ A.

(iii) ⇒ (i). This was proved in (2.36) on page 70.

(i) ⇒ (iv). We prove the contrapositive statement. Suppose there exists B ∈ A with TrB =

t > 1, such that (iv) does not hold. Note that Tr(t−1B) = 1, and hence t−1B ∈ Rd. Since

1dId ∈ A, for ε ∈ (0, 1) the convex combination B′ = (1 − ε)B + ε

dId ∈ A ∩M++d satisfies

TrB′ = (1 − ε)t + ε > 1. Thus, without loss of generality we assume that B ∈ M++d . By

(2.25) we then have

log(t−1B) = log(t−1IdB) = log(t−1Id) + logB = Id log t−1 + logB,

and the von Neumann entropy H(t−1B) satisfies

H(t−1B) = Tr(−t−1B log(t−1B)) = log t+ Tr(−t−1B logB)

where log t > 0. However,

HA(t−1B) = minA∈A

Tr(−t−1B logA) ≤ Tr(−t−1B logB) < H(t−1B),

and (i) does not hold, as we required.

In the commutative case, a number of interesting results on the entropy of a probability

distribution over different Rd-convex corners are given in [11, Section 2]. For instance, for an

Rd-convex corner A it is shown that

H(p) = HA(p) +HA[(p) for all p ∈ Pd. (2.40)

We seek to consider the non-commutative setting. Many of the results in [11] require com-

mutativity, but the following definition and Proposition 2.4.38 will allow some progress in a

restricted case of the Md setting.


Let V = v1, . . . , vd be an orthonormal basis of Cd and let A =∑d

i=1 aiuiu∗i ∈ M++

d

where u1, . . . , ud is a set of orthonormal eigenvectors of A and ai ∈ R+. We have

∆V (logA) =d∑

i,j=1

vjv∗j | 〈ui, vj〉 |2 log ai,

and for ρ ∈ Rd,

Tr(ρ∆V (logA)) =

d∑i,j=1

〈ρvj , vj〉 | 〈ui, vj〉 |2 log ai. (2.41)

Working in the extended reals R = R ∪ −∞,∞, we define Tr(ρ∆V (logA)) for all A ∈M+d

by extending (2.41) to hold for all positive semi-definite matrices.

Proposition 2.4.38. Let ρ ∈Md be a state and V be an orthonormal basis of Cd such that

ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A.

Then there exists M ∈ A ∩ DV which commutes with ρ and satisfies −Tr(ρ logM) =

HA(ρ).

Proof. By Lemma 2.4.25, there exists A ∈ A satisfying −Tr(ρ logA) = HA(ρ). Let A =∑di=1 aiuiu

∗i where u1, . . . , ud is a set of orthonormal eigenvectors of A and ai > 0. Working

in R where necessary, set

A′ =

d∑j=1

vjv∗j exp2

(d∑i=1

| 〈ui, vj〉 |2 log ai

)∈ DV . (2.42)

Comparing to (2.41) we observe that

Tr(ρ logA′) =d∑

i,j=1

〈ρvj , vj〉 | 〈ui, vj〉 |2 log ai = Tr(ρ∆V (logA)). (2.43)

Using that∑d

i=1 | 〈ui, vj〉 |2 = ‖vj‖2 = 1 in (2.42) on page 77, the convexity of the exponential

function yields

A′ ≤d∑j=1

vjv∗j

d∑i=1

| 〈ui, vj〉 |2ai =d∑j=1

vjv∗j 〈Avj , vj〉 = ∆V (A) ∈ A,

where we used the assumption ∆V (A) ⊆ A. It follows by the hereditarity of A that A′ ∈ A.

Now ∆V (ρ) = ρ, and we use Lemma 2.4.12 and (2.43) to obtain

HA(ρ) = −Tr(ρ logA) = −Tr(∆V (ρ) logA

)= −Tr

(ρ∆V (logA)

)= −Tr(ρ logA′).


Thus A′ ∈ A ∩ DV satisfies −Tr(ρ logA′) = HA(ρ) and clearly commutes with ρ ∈ DV .

The next two propositions are analogous to [11, Lemma 2.5 (a)].

Proposition 2.4.39. Let A and B be Md-convex corners, and ρ = AB ∈ Rd where A ∈ A

and B ∈ B. Then

H(ρ) ≥ HA(ρ) +HB(ρ).

Equality holds if and only if A and B are elements of A and B achieving the respective minima

in Definition 2.4.26.

Proof. Note that AB = ρ = ρ∗ = B∗A∗ = BA, establishing that A and B commute and are

thus simultaneously diagonalisable. Then

HA(ρ) +HB(ρ) ≤ −Tr(ρ logA)− Tr(ρ logB)

= −Tr(ρ log(AB)

)= −Tr ρ log ρ = H(ρ),

using (2.25) on page 60. The equality condition is immediate.


ρ ∈ DV . Suppose A and B are bounded Md-convex corners satisfying ∆V (A) ⊆ A, ∆V (B) ⊆

B, and B ⊆ A]. Then

H(ρ) ≤ HA(ρ) +HB(ρ).

Proof. By Proposition 2.4.38, there exist A ∈ A and B ∈ B such that HA(ρ) = −Tr(ρ logA)

and HB(ρ) = −Tr(ρ logB), and such that A,B ∈ DV . Let ρ =∑d

i=1 piviv∗i , A =

∑di=1 aiviv

∗i

and B =∑d

i=1 biviv∗i . It then follows from (2.27) on page 60 that

H(ρ)−HA(ρ)−HB(ρ) = Tr(ρ logA) + Tr(ρ logB)− Tr(ρ log ρ)

=∑i:pi>0

pi log

(aibipi

)≤ log

∑i:pi>0

aibi

≤ 0,

where the first inequality follows from the concavity of the log function and because∑d

i=1 pi =

1, while the second inequality follows from the fact that B ∈ A] and consequently 〈A,B〉 =∑di=1 aibi ≤ 1.

The following result is given in [11].


Theorem 2.4.41. [11, Theorem 1.1] If A,B are Rd-convex corners satisfying A[ ⊆ B, then

for any p = (pi)i∈[d] ∈ Pd there exist a = (ai)i∈[d] ∈ A and b = (bi)i∈[d] ∈ B such that pi = aibi

for all i ∈ [d].

We have the following non-commutative version.

Proposition 2.4.42. Let V = v1, . . . , vd be an orthonormal basis and ρ ∈ DV ∩ Rd. If

A,B are Md-convex corners satisfying ∆V (A) ⊆ A and A] ⊆ B, then there exist A ∈ A and

B ∈ B such that ρ = AB.

Proof. Recall from Lemma 2.4.10 that ∆V (A) ⊆ A is equivalent to the condition ∆V (A) =

DV ∩ A. Let A0 = ∆V (A) = DV ∩ A and B0 = DV ∩ B. We define the bijection φ : Rd+ →

DV ∩M+d by φ

(∑di=1 aiei

)=∑d

i=1 aiviv∗i where ai ∈ R+ and e1, . . . , ed is the canonical

basis of Cd. By the argument of Lemma 2.2.3, φ−1(A0) and φ−1(B0) are Rd-convex corners.

We claim that

DV ∩ A]0 = DV ∩ A]. (2.44)

Since A0 ⊆ A, the inclusion DV ∩ A]0 ⊇ DV ∩ A] is trivial. For the reverse inclusion, fix

M ∈ DV ∩ A]0 and A ∈ A. We have

〈M,A〉 = 〈∆V (M), A〉 = 〈M,∆V (A)〉 ≤ 1,

where we have used Lemma 2.4.12 and the fact that ∆V (A) ∈ A0. Thus, M ∈ DV ∩ A] and

(2.44) holds. Now observe that

DV ∩ A]0 = DV ∩ A] ⊆ DV ∩ B = B0. (2.45)

Since 〈f, g〉 = 〈φ(f), φ(g)〉 for all f, g ∈ Rd+, it is clear that (φ−1(A0))[ = φ−1(DV ∩ A]0).

By (2.45) it then holds that (φ−1(A0))[ ⊆ φ−1(B0). For state ρ =∑d

i=1 piviv∗i ∈ DV we

set p = φ−1(ρ) ∈ Pd and apply Theorem 2.4.41. Thus there exist a = (ai) ∈ φ−1(A0) and

b = (bi) ∈ φ−1(B0) such that pi = aibi for all i ∈ [d]. Then φ(a) =∑d

i=1 aiviv∗i ∈ A0 ⊆ A

and φ(b) =∑d

i=1 biviv∗i ∈ B0 ⊆ B satisfy φ(a)φ(b) = ρ as required.

Corollary 2.4.43. Let A and B be Md-convex corners and V be an orthonormal basis of Cd

such that ∆V (A) ⊆ A and A] ⊆ B. Then for any state ρ ∈ DV

H(ρ) ≥ HA(ρ) +HB(ρ).

Proof. This follows from Propositions 2.4.39 and 2.4.42.


The following result is the non-commutative analogue of (2.40).

Theorem 2.4.44. Let ρ ∈ Md be a state and V be an orthonormal basis of Cd such that

ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A. Then H(ρ) =

HA(ρ) +HA](ρ).

Proof. The result follows from putting B = A] in Proposition 2.4.40 and Corollary 2.4.43;

Lemma 2.4.24 shows that the necessary conditions are satisfied.

We finish this section with a non-commutative analogue of the lower bound on entropy

given in Proposition 1.2.9.


ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A. It holds that

HA(ρ) ≥ H(ρ)− log γ(A).

Equality holds if and only if γ(A)ρ ∈ A.

Proof. By Proposition 2.4.38 we can choose B ∈ A satisfying HA(ρ) = −Tr(ρ logB) and

such that B ∈ DV . Write ρ =∑d

i=1 ρiviv∗i and B =

∑di=1 biviv

∗i with ρi, bi ≥ 0. Then

H(ρ) = −Tr(ρ log ρ) = −d∑i=1

ρi log ρi,

and

HA(ρ) = −Tr(ρ logB) = −d∑i=1

ρi log bi.

Now Lemma 1.0.1 gives

d∑i=1

ρi log

(ρibi

)≥ − log

(d∑i=1

bi

)≥ − log γ(A),

whence the result is immediate. The equality condition follows as in Theorem 1.2.9.

Chapter 3

Non-commutative graphs and

associated convex corners

The non-commutative graph of a quantum channel was introduced in [13] as a generalisation

to the quantum setting of the confusability graph of a classical channel. After summarising

the classical case and necessary background from [13], we build on the theory introduced

there to explore further the analogy between graphs and non-commutative graphs. Just as

a number of Rd-convex corners can be associated with a given graph, so we will introduce a

number of Md-convex corners which can be associated with a given non-commutative graph.

These will include a quantum generalisation of the theta corner of a graph, which will be

shown to lead to a ‘quantum sandwich theorem’, analogous to Theorem 1.4.5.

3.1 Background

This section summarises the foundations of zero-error information theory and reviews some

established results on non-commutative graphs. We begin by describing the classical channel

and its confusability graph, and recall the definitions of the one shot zero-error capacity and

the Shannon capacity of a channel. Finally we recall how [13] generalises the classical theory

to associate a non-commutative graph with a quantum channel, and we summarise how the

classical channel can be embedded in the quantum setting.

81

3.1 Background 82

3.1.1 Classical channels

We have encountered situations, for example the definition of graph entropy in Definition

1.3.5, where we encode only a ‘probable set’, and in doing so ‘tolerate’ a small probability of

error. On the other hand, in zero-error information theory we impose the condition that the

probability of error be zero throughout. This was first discussed by Shannon in [48], where

the concept of zero-error capacity was introduced. Here we follow [13] and [35]. A channel is

the means by which a sender, usually named ‘Alice’, transmits a signal to a receiver, usually

named ‘Bob’. The simplest channel we can consider is the identity channel, whose output is

always equal to its input. In general the signal can be subject to interference, and in this

case we say the channel is noisy. Suppose Alice transmits a letter from the finite alphabet

X = x1, . . . , xn through the noisy channel N to Bob, who then receives a letter from the

finite alphabet Y = y1, . . . , ym. We assume the channel N is memoryless and stationary. It

may thus be described by fixed probabilities p(yi|xj) for i ∈ [m] and j ∈ [n], where p(yi|xj)

denotes the probability that Bob receives yi ∈ Y when Alice sends xj ∈ X . Although we

will write N : X → Y, we note that N is not a function. We can, however, think of the

channel as a mapping between sets of probability distributions in the following sense. Define

the matrix PN = (pij)i∈[m],j∈[n] ∈ Mm,n where pij = p(yi|xj). Let αi be the probability that

Alice sends the letter xi and βi be the probability that Bob receives the letter yi. Setting

α = (αi)i∈[n] ∈ Pn and β = (βi)i∈[m] ∈ Pm, we see that the action of channel N can be

described by the mapping Pn → Pm given by

β = PNα. (3.1)

Distinct letters xi, xj ∈ X are said to be confusable if there exists y ∈ Y such that

p(y|xi)p(y|xj) > 0, (3.2)

for in that case whenever Alice sends either xi or xj , Bob could receive y and would not know

for certain whether xi or xj was sent.

Definition 3.1.1. [13, Section I] Corresponding to noisy channel N : X → Y as above, is

the confusability graph GN with V (GN ) = X and in which xi ∼ xj if and only if xi and xj

are distinct and confusable.

Suppose Alice chooses a subset X0 ⊆ X known to Bob such that when she transmits a

letter from X0 to Bob, he can know with certainty which letter was sent.

3.1 Background 83

Definition 3.1.2. [35, Definition 6.1] The maximum possible number of elements of such a

subset X0 is called the one-shot zero-error capacity of channel N and denoted α(N).

It is clear that X0 cannot contain two mutually confusable elements of X . Indeed, it is

trivial to verify the following proposition.

Proposition 3.1.3. [35, Proposition 6.5] Let noisy channel N have confusability graph GN .

Then

α(N) = α(GN ),

where α(GN ) is the independence number of GN .

To consider multiple uses of N rather than the ‘one-shot’ case, the following standard

definition from graph theory will be needed.

Definition 3.1.4. ([15, p.155]) The strong product of graphs G and H is the graph G H

with V (G H) = V (G) × V (H) and (i, j) ' (k, l) in G H if and only if i ' k in G and

j ' l in H.

We regard k successive uses of the memoryless and stationary channel N : X → Y as a

single use of the channel Nk : X k → Yk, with

p((y1, . . . , yk)|(x1, . . . , xk)

)= p(y1|x1) . . . p(yk|xk).

Letting Gn denote the strong product of n copies of G, it is clear that the confusability

graph of Nk is given by GNk = GkN [35, Proposition 6.9]. The one-shot zero-error capacity

of Nk : X k → Yk is then given by α(GkN ), motivating the next definition.

Definition 3.1.5. [35, Section 6.2], [28, Section I]. The Shannon capacity, also known as the

zero-error capacity, of the graph G is defined by

c(G) = supn∈N

n

√α(Gn).

We now recall Fekete’s Lemma, which will be used a number of places in the sequel.

Importantly, it allows the supremum in the definition of Shannon capacity to be replaced by

a limit, as we show in Corollary 3.1.7. This corollary is well known, but we supply a proof

because we will employ the same method later for other parameters.

Lemma 3.1.6. ( Fekete’s Lemma.) A real sequence (an)n∈N is called sub-additive if am+n ≤

am + an for all m,n ∈ N. Similarly, a real sequence (an)n∈N is called super-additive if

3.1 Background 84

am+n ≥ am + an for all m,n ∈ N.

If (an)n∈N is sub-additive, then limn→∞ann exists and is equal to infn∈N

ann . (Note that we

may have to work in R ∪ −∞ and that the limit and infimum can be −∞.)

If (an)n∈N is super-additive, then limn→∞ann exists and is equal to supn∈N

ann . (Note that we

may have to work in R ∪ ∞ and that the limit and supremum can be ∞.)

Corollary 3.1.7. [28, Section I]. The limit limn→∞n√α(Gn) exists and is equal to c(G),

the Shannon capacity of the graph G.

Proof. If S and T are independent sets in graphs G and H respectively, then S × T is

independent in GH, and hence α(GH) ≥ α(G)α(H). Letting an = logα(Gn), we have

am+n = logα(Gm Gn

)≥ log

(α(Gm)α(Gn)

)=am + an,

whence Lemma 3.1.6 yields that

supn∈N

log n

√α(Gn) = lim

n→∞

(log n

√α(Gn)

),

and the result follows.

We have the intuitive meaning that zero-error transmission of a long message through

channel N is asymptotically equivalent to using the identity channel over an alphabet of size

c(GN ).

By Definition 3.1.5 it is immediate that c(G) ≥ α(G). Although c(G) = α(G) for some

graphs, in general the calculation of the Shannon capacity of an arbitrary graph is a no-

toriously difficult problem. A major breakthrough was made in [28] where Laszlo Lovasz

introduced the graph parameter θ(G), now known as the Lovasz number of graph G, and

which we have already encountered in (1.35) on page 31. Lovasz [28, Lemma 3] showed for

any graph G that α(G) ≤ θ(G). Crucially, [28, Corollary 7] shows that the Lovasz number is

sub-multiplicative over strong products, that is θ(Gn) ≤ θ(G)n for all n ∈ N. It immediately

follows that

c(G) = limn→∞

n

√α(Gn) ≤ lim

n→∞n√θ(G)n = θ(G),

3.1 Background 85

and we see that c(G) is bounded as follows:

α(G) ≤ c(G) ≤ θ(G). (3.3)

In [58], Witsenhausen solves the problem of ‘zero-error side information’. Here we follow

the discussion of that result in [35]. Suppose in addition to noisy channel N : X → Y,

Alice can communicate with Bob using an identity channel I : [k] → [k] for any k ∈ N of

her choice. Alice wishes to run this identity channel in parallel with N such that Bob can

retrieve the letter x ∈ X sent by Alice with certainty. Alice seeks a function f : X → [k],

such that when Bob receives the output from x ∈ X passing through noisy channel N along

with the output f(x) from the identity channel I, he can deduce x with certainty. However,

the identity channel is regarded as ‘expensive’, with the cost increasing with k, and so it is

desired to minimise k. The minimum value of k such that these constraints can be satisfied is

denoted χ(N) and known as the packing number of N . If noisy channel N has confusability

graph GN , it is not hard to see that it is both necessary and sufficient that function f be a

colouring of GN , giving the following result.

Proposition 3.1.8. [35, Theorem 6.11] For classical channel N with confusability graph GN

it holds that χ(N) = χ(GN ).

Moving from the one-shot case to that of n successive uses of N , the packing number of

Nn is then given by χ(Nn) = χ(GNn) = χ(GnN ). Recall for graphs G and H that χ(GH) ≤

χ(G)χ(H). (To see this, observe that if g : V (G) → [m] and h : V (H) → [n] are colourings

of G and H respectively, then f : V (G) × V (H) → [m] × [n] is a colouring of G H where

f(i, j) = (g(i), h(j)).) Using a similar argument to that of Corollary 3.1.7, it then follows

that (log(χ(Gn))n∈N is sub-additive and we can apply Lemma 3.1.6 to show the existence

of the limit in the following definition, which is analogous to Corollary 3.1.7.

Definition 3.1.9. [35, Definition 6.12]. For a graph G we define

r(G) = limn→∞

n

√χ(Gn).

If GN is the confusability graph of a noisy channel N , the Witsenhausen rate of channel N

is defined by

R(N) = log r(GN ).

For graph G it is also convenient to write R(G) = log r(G).

3.1 Background 86

3.1.2 Quantum measurement

Fundamental to quantum mechanics is the non-classical phenomenon that measurement af-

fects the system. The next proposition briefly summarises some basic facts about quantum

measurements.

Proposition 3.1.10. ([35, Theorems 3.9, 3.13].) Corresponding to a quantum measurement

on states in Md with possible outcomes λ1, . . . , λk, is a set of operators Qii∈[k] ⊆Mn,d for

some n ∈ N, known as a measurement system, satisfying∑k

i=1Q∗iQi = Id and such that, if

the quantum system is in initial state ρ ∈ Rd, then the following hold.

(i) The probability of observing outcome λi is Tr(QiρQ∗i ).

(ii) If outcome λi is observed, then the state ‘collapses’ to the density matrix

QiρQ∗i

Tr(QiρQ∗i )∈ Rn.

Recalling (2.23), from (i) and (ii) it follows that the measurement yields an ensemble with

density matrix

ρ′ =k∑i=1

QiρQ∗i ∈ Rn. (3.4)

Some notion of distinguishability of states will be needed, and Proposition 3.1.10 (ii)

suggests the following definition.

Definition 3.1.11. [35, Definition 7.1] States ρ1, . . . , ρm ∈ Rd are perfectly distinguishable

if there exists a measurement system Mii∈[k] with k ≥ m such that Tr(MiρjM∗i ) = δij for

all i ∈ [k] and j ∈ [m].

The following proposition gives a useful characterisation of the distinguishability of states.

A proof can be found in [35].

Proposition 3.1.12. [35, Proposition 7.4] States ρ1, . . . , ρm ∈ Rd are perfectly distinguish-

able if and only if ρiρj = 0 for all i 6= j.

Remark 3.1.13. (i) For unit vectors vi ∈ Cd, the density matrices v1v∗1, . . . , vkv

∗k ∈ Rd are

perfectly distinguishable if and only if v1, . . . , vk ∈ Cd are orthonormal.

(ii) By Lemma B.0.2 (x), states ρ1, . . . , ρm ∈ Rd are perfectly distinguishable if and only if

Tr(ρiρj) = 0 for all i 6= j.

3.1 Background 87

3.1.3 Quantum channels

Having discussed classical channels, we now consider the quantum setting. In this case Alice

and Bob communicate by sending and receiving quantum states. The communication between

Alice and Bob is described by the quantum channel Φ : Md →Mk such that when Alice sends

a state ρ ∈ Rd, Bob receives the state σ ∈ Rk given by σ = Φ(ρ). In fact, the concept of

a quantum channel models any physically realisable process leading to a change in state.

Following [32], [35] and [57], we first recall some definitions, and then give some important

characterisations of quantum channels.

Definition 3.1.14. Linear map Φ : Md →Mk is called trace preserving if Tr(Φ(A)

)= Tr(A)

for all A ∈Md.

Definition 3.1.15. Linear map Φ : Md →Mk is positive if Φ(A) ∈M+k for all A ∈M+

d .

For a linear map Φ : Md →Mk and m ∈ N, we define the map Φm : Mm(Md)→Mm(Mk)

by Φm((Qij)i,j∈[m]

)=(Φ(Qij)

)i,j∈[m]

, where Qij ∈Md for i, j ∈ [m].

Definition 3.1.16. Linear map Φ : Md →Mk is completely positive if Φm is positive for all

m ∈ N.

A linear map that is both completely positive and trace preserving will be called a c.p.t.p.

map. It is a consequence of the postulates of quantum mechanics that the set of quantum

channels is precisely the set of c.p.t.p. maps [54, Definition 2.13]. The next two well-known

propositions give characterisations of completely positive maps and c.p.t.p. maps. We let

Eij denote the canonical matrix unit Eij = eie∗j in Md.

Proposition 3.1.17. ([8, Theorems 1 and 2].) For map Φ : Md → Mk the following are

equivalent:

(i) Φ is completely positive;

(ii) There exist n ∈ N and A1, . . . , An ∈Mk,d such that

Φ(V ) =n∑i=1

AiV A∗i for all V ∈Md; (3.5)

(iii) The matrix PΦ = (Φ(Eij))i,j∈[d] ∈Md(Mk) is positive.

Proposition 3.1.18. [54, Corollary 2.27]. For map Φ : Md →Mk the following are equiva-

lent:

3.1 Background 88

(i) Φ is a c.p.t.p. map;

(ii) There exist n ∈ N and A1, . . . , An ∈Mk,d satisfying∑n

i=1A∗iAi = Id such that

Φ(V ) =n∑i=1

AiV A∗i for all V ∈Md; (3.6)

(iii) Matrix PΦ = (Φ(Eij))i,j∈[d] ∈Md(Mk) is positive, and Φ satisfies

Tr(Φ(Eij)) = δij for all i, j ∈ [d].

Remark 3.1.19. (i) The form (3.5) is known as a Kraus representation of Φ, and the matrices

A1, . . . An are called the Kraus operators of Φ for this representation.

(ii) The matrix PΦ as defined in Proposition 3.1.17 is known as the Choi matrix of Φ.

(iii) Comparing (3.4) with Proposition 3.1.18 (ii) shows that quantum measurement is an

example of a quantum channel.

We briefly review some properties of Choi matrices and Kraus operators. It is clear that if

Φ,Ψ are completely positive linear maps from Md to Mk, then Φ = Ψ if and only if PΦ = PΨ.

(Interestingly, if we identify Md(Mk) as Mkd, then PΦ = PΨ does not imply that Φ = Ψ for

arbitrary completely positive linear maps Φ,Ψ. For example if PΦ ∈M6, then Φ : Md →Mk

could have (d, k) = (6, 1), (3, 2), (2, 3) or (1, 6).) If, however, Φ : Md →Mk is a c.p.t.p. map,

then

Tr(PΦ) =

d∑i=1

Tr(Φ(Eii)) =

d∑i=1

Tr(Eii) = d,

and it follows that if Φ,Ψ are c.p.t.p. maps then Φ = Ψ if and only if PΦ = PΨ.

It is important to note that the Kraus representation of a completely positive map Φ as

in (3.5) is not unique; indeed, there is ‘unitary freedom’ in the choice of Kraus operators, as

described in the following standard result.

Proposition 3.1.20. [32, Theorem 8.2]. Suppose that the completely positive maps Φ and

Ψ have Kraus representations

Φ(ρ) =k∑i=1

AiρA∗i and Ψ(σ) =

l∑i=1

BiσB∗i

respectively. By appending zero operators to the shorter list we can ensure that k = l. Then

Φ = Ψ if and only if there exists a unitary matrix U = (uij)i,j∈[k] such that Ai =∑k

j=1 uijBj.

The following standard result, showing it is possible to choose a set of mutually orthogonal

3.1 Background 89

Kraus operators, is a simple corollary of Proposition 3.1.20. We supply a proof for the benefit

of the reader.

Corollary 3.1.21. (See [32, Exercise 8.10].) Given a completely positive map Φ : Md →Mk,

there exist Kraus operators F1, . . . , Fm ∈Mk,d for Φ satisfying Tr(FiF∗j ) = 0 if i 6= j.

Proof. Suppose Φ has a Kraus representation Φ(ρ) =∑m

i=1EiρE∗i for ρ ∈Md and with Ei ∈

Mk,d. Define the matrix W = (wpq)p,q∈[m] ∈Mm by wpq = Tr(EpE∗q ). Then wpq = 〈Ep, Eq〉 =

〈Eq, Ep〉 = wqp, and so W is Hermitian. There thus exists unitary U = (uij)i,j∈[m] ∈ Mm

such that UWU∗ is diagonal. Let V = (vij) = UWU∗ and set Fj =∑m

l=1 ujlEl. Proposition

3.1.20 shows that∑m

i=1 FiρF∗i is a Kraus representation of Φ. We then have

Tr(FpF∗q ) = Tr

∑l,n

uplEluqnE∗n

=∑l,n

upl Tr(ElE∗n)uqn

=∑l,n

uplwlnuqn = vpq;

noting that V is diagonal completes the proof.

Remark 3.1.22. Corollary 3.1.21 can also be seen as a consequence of the following canonical

method as used in [35, Theorem 4.8] for obtaining a (mutually orthogonal) set of Kraus

operators A1, . . . , An with n ≤ kd for a completely positive map Φ : Md → Mk from its

Choi matrix PΦ. Since PΦ ∈ M+dk, there exists an orthonormal basis u1, . . . , udk ⊆ Cdk of

eigenvectors of PΦ such that PΦ =∑n

i=1 λiuiu∗i with n ≤ kd and eigenvalues λ1, . . . , λn > 0.

Letting vi =√λiui ∈ Ckd, we have PΦ =

∑ni=1 viv

∗i . For each i = 1, . . . , n we write

vi =

v

(1)i

...

v(d)i

, with v(j)i ∈ C

k for j = 1, . . . , d,

and it is easy to see that PΦ =(∑n

l=1 v(i)l v

(j)∗l

)i,j∈[d]

, giving that

Φ(eie∗j ) =

n∑l=1

v(i)l v

(j)∗l . (3.7)

Setting Ai =(v

(1)i . . . v

(d)i

)∈Mk,d and using (3.7) yields

n∑l=1

Aleie∗jA∗l =

n∑l=1

v(i)l v

(j)∗l = Φ(eie

∗j ).

3.1 Background 90

By linearity Φ then has a Kraus representation Φ(ρ) =∑n

l=1AlρA∗l for all ρ ∈Md. It is easy

to see that Tr(AiA∗j ) = 〈vi, vj〉, so the orthogonality of the set A1, . . . , An follows from that

of v1, . . . , vn.

By analogy with the classical case, we now make the following definition for a quantum

channel Φ.

Definition 3.1.23. ([13], [35].) We define α(Φ), the one-shot zero-error capacity of quantum

channel Φ : Md →Mk, to be the maximum n ∈ N such that there exist n orthonormal state

vectors v1, . . . , vn ∈ Cd whose corresponding density matrices yield the perfectly distinguish-

able outputs Φ(v1v∗1), . . . ,Φ(vnv

∗n).

3.1.4 Non-commutative graphs

The concept of the confusability graph of a classical channel has been generalised to the

quantum setting in [13]. Here we give an overview of the theory, beginning with the definition

of an operator system, first introduced in [2]. More details can be found in [36]. Here H will

denote a complex Hilbert space, and we write L(H) for the set of all linear transformations

H → H.

Definition 3.1.24. A subspace S ⊆ L(H) is called an operator system if S is unital and

self-adjoint; that is, if I ∈ S, and X ∈ S ⇒ X∗ ∈ S.

Given an orthonormal basis of a Hilbert space H of dimension d, we work in that basis

and make the canonical identification L(H) ≡Md. We will often write Md in place of L(H)

even if we have not specified a particular basis.

Remark 3.1.25. Straightforward results in Appendix B show that if S ⊆ Md and T ⊆ Mk

are operator systems, then S ⊗ T ⊆ Mdk is an operator system. Furthermore, if d = k then

S + T ⊆Md is an operator system.

Definition 3.1.26. ([13, equation (2)]) Let Φ : Md →Mk be a quantum channel with Kraus

operators E1, . . . , En ∈Mk,d. Then the subspace

SΦ = spanE∗i Ej : i, j ∈ [n] ⊆Md (3.8)

is called the non-commutative graph of Φ.

Remark 3.1.27. (i) The term ‘non-commutative graph’ was introduced in [13]. In [55], Weaver

calls the same objects ‘quantum graphs’.

3.1 Background 91

(ii) As shown in Proposition 3.1.20, a channel Φ has many different possible sets of Kraus

operators, but the same proposition immediately shows that subspace SΦ is independent of

this choice.

It is easy to see that if Φ is a quantum channel, then the non-commutative graph SΦ as

given in (3.8) is an operator system: just note that SΦ is unital because∑k

i=1E∗i Ei = I and

self-adjoint because (E∗i Ej)∗ = E∗jEi. In fact the following stronger proposition holds.

Proposition 3.1.28. [12, Lemma 2] Let S ⊆Md. The following are equivalent:

(i) S is an operator system;

(ii) there exists a quantum channel Φ : Md →Mk for some k ∈ N such that S = SΦ.

With this in mind, the terms operator system and non-commutative graph will be used

interchangeably in the sequel.

In [13, Remark, p.5] it is noted that distinct channels can have the same non-commutative

graph. (We later see how a classical channel embeds in the quantum setting, but for now

recall that different classical channels can have the same confusability graph, since the con-

fusability graph records only which pairs of inputs are confusable, but does not give any

actual probabilities.) To illustrate this, we now supply a simple example of a whole family of

quantum channels with the same non-commutative graph.

Example 3.1.29. For p ∈ (0, 1), let Φp : M2 →M2 be given by

Φp(ρ) =4∑i=1

A(p)i ρA

(p)∗i , ρ ∈M2, (3.9)

where

A(p)1 =

√1− p e1e

∗1, A

(p)2 =

√p e1e

∗2, A

(p)3 =

√p e2e

∗1, A

(p)4 =

√1− p e2e

∗2.

We note for all p ∈ (0, 1) that∑4

i=1A(p)∗i A

(p)i = I, so Φp is a quantum channel with Kraus

representation (3.9). Furthermore, for all p ∈ (0, 1) we have SΦp = spanA(p)∗i A

(p)j : i, j =

1, 2, 3, 4 = M2. (Note Φp can be thought of as the quantum generalisation of the binary

symmetric channel with error probability p and with Kraus operators given by (3.10) on

page 92.)

The following characterisation of distinguishability is discussed in [13, Section III]. For

convenience we supply a proof.

3.1 Background 92

Proposition 3.1.30. [13] For quantum channel Φ : Md →Mk and orthonormal state vectors

u, v ∈ Cd, the following are equivalent:

(i) Φ(uu∗) and Φ(vv∗) are perfectly distinguishable;

(ii) uv∗ ∈ S⊥Φ ;

(iii) 〈u,Av〉 = 0 for all A ∈ SΦ.

Proof. Recall from Remark 3.1.13 (ii) that Φ(uu∗) and Φ(vv∗) are perfectly distinguishable

precisely when Tr (Φ(uu∗)Φ(vv∗)) = 0. Let Φ have Kraus operators E1, . . . , En. Now observe

that

Tr (Φ(uu∗)Φ(vv∗)) =∑i,j∈[n]

Tr(Eiuu∗E∗i Ejvv

∗E∗j ) =∑i,j∈[n]

Tr(v∗E∗jEiuu∗E∗i Ejv)

=∑i,j∈[n]

| 〈Eiu,Ejv〉 |2 =∑i,j∈[n]

| 〈uv∗, E∗i Ej〉 |2 = 0.

It is then clear that this vanishes if and only if 〈uv∗, E∗i Ej〉 = 0 for all i, j ∈ [n], that is

uv∗ ∈ S⊥Φ , which is trivially equivalent to the condition 〈u,Av〉 = 0 for all A ∈ SΦ.

Recall for the classical channel N : X → Y that a, b ∈ X are distinct and non-confusable

when a 6' b in GN , or equivalently when a ∼ b in GN . A comparison with Proposition 3.1.30

shows the sense in which the non-commutative graph of a quantum channel generalises the

notion of the confusability graph of a classical channel.

We consider more formally how the classical channel N : X → Y with X = x1, . . . , xn

and Y = y1, . . . , ym can be embedded in the quantum framework. As in (3.1), we describe

the channel by the matrix P =(p(yi|xj)

)i∈[m],j∈[n]

with β = Pα, where α ∈ Pn and β ∈

Pm. Let the canonical orthonormal bases of Cn and Cm be e1, . . . , en and f1, . . . , fm

respectively. Using notation as in (2.1) on page 35, for v ∈ Pk we form the diagonal matrix

φ(v) ∈ Mk. Then [13, Section II] points out that the classical channel N can be seen as the

restriction to φ(Pn) of the c.p.t.p. map ΦN : Mn →Mm with Kraus operators

Vij =√p(yi|xj)fie∗j for i ∈ [m], j ∈ [n] (3.10)

in the sense that for β = Pα we have φ(β) = ΦN (φ(α)).

Letting GN be the confusability graph of classical channel N , Definitions 3.1.1 and 3.1.26

3.1 Background 93

and (3.2) on page 82 then give

SΦN = spanV ∗ijVkl : i, k ∈ [m], j, l ∈ [n]

= spaneie∗j : i, j ∈ [n], i ' j in GN. (3.11)

It is then clear that distinct i, j ∈ X are not confusable, that is i ∼ j in GN , precisely when

eie∗j ∈ S⊥ΦN ,

a condition we compare to Proposition 3.1.30.

These observations also suggest the following way to associate an operator system SG

with a graph G.

Definition 3.1.31. [35, Section 7.2] Consider graph G with V (G) = [n]. We define SG, the

operator system associated to graph G, by

SG = spaneie∗j : i, j ∈ [n], i ' j in G.

The next corollary is immediate from (3.11).

Corollary 3.1.32. If a classical channel N : X → Y with confusability graph G is the

restriction to φ(P|X |) of c.p.t.p. map ΦN as described above, then SΦN = SG, and distinct

i, j ∈ X are confusable precisely when eie∗j ∈ SG.

Definition 3.1.23 defined α(Φ), the one-shot zero-error capacity of quantum channel Φ.

We now define the related notion of α(S) for an operator system S ⊆ Md, as introduced in

[13].

Definition 3.1.33. ([35, Definition 7.12], [13, equation 3].) Given an operator system S ⊆

Md, a orthonormal set v1, . . . , vk ⊆ Cd is S-independent if viv∗j ∈ S⊥ for all i 6= j. The

independence number of S is given by the maximum cardinality of an S-independent set and

is denoted by α(S).

If Φ is a quantum channel, Proposition 3.1.30 shows that the orthonormal set v1, . . . , vk

is SΦ-independent precisely when Φ(v1v∗1), . . . ,Φ(vkv

∗k) are perfectly distinguishable. The

following result is immediate from Definition 3.1.23, and should be compared to the classical

result in Proposition 3.1.3.

3.1 Background 94

Corollary 3.1.34. ([13, p.6], [35, Theorem 7.14]) If Φ is a quantum channel, then

α(SΦ) = α(Φ).

Moving from the one-shot case to multiple uses of channel Φ requires us to consider

tensor products of operator systems. For basic properties of the tensor product, the reader

is referred to Appendix B. The following lemma shows that the tensor product of operator

systems is the natural quantum generalisation of the classical notion of the strong product

GH of the graphs G and H. It is instructive to recall the proof.

Lemma 3.1.35. [35, Theorem 8.16] If G and H are graphs, then

SGH = SG ⊗ SH .

Proof. Let V (G) = [n] and V (H) = [m], and let the canonical orthonormal bases of Cn and

Cm be e1, . . . , en and f1, . . . , fm respectively. Then

SG ⊗ SH = spaneie∗j : i ' j in G ⊗ spanfkf∗l : k ' l in H

= spaneie∗j ⊗ fkf∗l : i ' j in G, k ' l in H

= span(ei ⊗ fk)(ej ⊗ fl)∗ : (i, k) ' (j, l) in GH = SGH

as stated.

We now show more formally how tensor products are to be used to define the Shannon

capacity of a quantum channel. If Φ : Mc → Mk and Ψ : Md → Ml are quantum channels

with sets of Kraus operators B1, . . . , Bp and A1, . . . Aq respectively, then we define the

tensor product map Φ ⊗ Ψ : (Mc ⊗Md) → (Mk ⊗Ml) as the unique linear map such that

(Φ⊗Ψ)(ρ⊗ σ) = Φ(ρ)⊗Ψ(σ) for ρ ∈Mc, σ ∈Md.

Lemma 3.1.36. If Φ, Ψ and Φ ⊗ Ψ are as given above, then Φ ⊗ Ψ is a quantum channel

with Kraus operators Bi⊗Aj for i ∈ [p], j ∈ [q] ([35, Section 3.6]), and it follows immediately

that SΦ⊗Ψ = SΦ ⊗ SΨ.

We write S⊗k for the tensor product of k copies of S, and similarly we denote the channel

given by the tensor product of k copies of Φ by Φ⊗k. Suppose quantum channel Φ has

operator system S. Then k successive uses of Φ corresponds to a single use of the quantum

channel Φ⊗k, which has operator system S⊗k and one-shot zero-error capacity α(S⊗k).

3.1 Background 95

Lemma 3.1.37. Independence number is super-multiplicative over tensor products (but not

in general multiplicative); that is for operator systems S, T we have α(S ⊗ T ) ≥ α(S)α(T ).

Proof. Suppose that ui : i = 1, . . .m is S-independent with α(S) = m and that vi : i =

1, . . . , n is T -independent with α(T ) = n. That is,⟨uiu∗j , S⟩

= 0 when S ∈ S for all i 6= j,

and⟨viv∗j , T

⟩= 0 for all T ∈ T when i 6= j. It follows that

〈(ui ⊗ vk)(uj ⊗ vl)∗, S ⊗ T 〉 =⟨uiu∗j , S⟩〈vkv∗l , T 〉 = 0

for all S ∈ S and T ∈ T when (i, k) 6= (j, l). We note also that

〈ui ⊗ vk, uj ⊗ vl〉 = 〈ui, uj〉〈vk, vl〉 = δijδkl. (3.12)

Thus ui ⊗ vj : (i, j) ∈ [m]× [n] is S ⊗ T -independent, giving α(S ⊗ T ) ≥ mn.

To see that independence number is not multiplicative in general, merely note from

Proposition 4.1.11 that α(SG) = α(G) for a graph G. Then recall from Lemma 3.1.35

that SG ⊗ SH = SGH , and use the fact that α(GH) can exceed α(G)α(H); for example,

α(C5) = 2, but letting V (C5) = [5], it is easy to see that (1, 1), (2, 3), (3, 5), (4, 2), (5, 4) is

an independent set in C5 C5 and that α(C5 C5) = 5.

The following definition is the quantum analogue of Corollary 3.1.7. The limit can be

shown to exist by the argument of Corollary 3.1.7.

Definition 3.1.38. [13, Section II] The Shannon capacity c(S) of operator system S is given

by

c(S) = limn→∞

n√α(S⊗n).

By Lemma 3.1.37, c(S) ≥ α(S). Upper bounds on c(S) are not trivial and will be

discussed later. Recall that the packing number of a classical channel N is equal to χ(GN ),

the chromatic number of its confusability graph. [35, Section 7.3] examines the corresponding

‘zero-error side information’ problem in the quantum setting for c.p.t.p. map Φ : Md →Mk.

We seek an orthonormal basis v1, . . . , vd ⊆ Cd, an n ∈ N and a function f : [d]→ [n] such

that the outputs

(Φ⊗ I)((vi ⊗ ef(i))(vi ⊗ ef(i))

∗) = Φ(viv∗i )⊗ ef(i)e

∗f(i)

are perfectly distinguishable for i = 1, . . . , d, where I : Mn → Mn is the identity channel

3.2 Convex corners from non-commutative graphs 96

and e1, . . . , en is the canonical orthonormal basis of Cn. The least n ∈ N such that this is

possible is denoted χ(Φ) and called the packing number of Φ.

The following definition of the chromatic number of an operator system is given in [35]

and generalises the notion of the chromatic number of a graph.

Definition 3.1.39. [35, Definition 7.26] If S ⊆ Md is an operator system, we define χ(S),

the chromatic number of S, to be the minimum n ∈ N such that there exists an orthonormal

basis of Cd which may be partitioned into n S-independent sets.

The next result is proved in [35] and is the quantum version of Proposition 3.1.8.

Proposition 3.1.40. [35, Theorem 7.27] If Φ is a quantum channel with associated non-

commutative graph SΦ, then χ(SΦ) = χ(Φ).

Remark 3.1.41. We noted that classical channel N : X → Y can be regarded as the restriction

to diagonal states of the c.p.t.p. map ΦN with Kraus operators given in (3.10) on page

92. Letting GN be the confusability graph of N , by Proposition 3.1.8, (4.11) on page 130,

Corollary 3.1.32 and Proposition 3.1.40 we obtain

χ(N) = χ(GN ) = χ(SGN ) = χ(SΦN ) = χ(ΦN ),

and we see that the classical side information problem is naturally embedded in the quantum

setting.

Remark 3.1.42. For a quantum channel Φ, Proposition 4.5.7 will show the existence of the

limit limn→∞logχ(S⊗nΦ )

n . By analogy with Definition 3.1.9, this will then lead us to the defi-

nition of the Witsenhausen rate of an operator system in Definition 4.5.8.

3.2 Convex corners from non-commutative graphs

3.2.1 The abelian, clique and full projection convex corners

Section 1.4 associated a number of Rd-convex corners with a given graph on d vertices. With a

given non-commutative graph S ⊆Md, we now associate the Md-convex corners ap(S), cp(S)

and fp(S), known respectively as the abelian, clique and full projection convex corners of S.

Recalling the definition of an S-independent set in Definition 3.1.33, we now define the

analogous concepts of S-clique and S-full sets. (For comparison, we restate the definition of

S-independence.)


Definition 3.2.1. Given a non-commutative graph S ⊆Md, an orthonormal set

v1, . . . , vk ⊆ Cd is called

(i) S-independent if viv∗j ∈ S⊥ for all i 6= j;

(ii) an S-clique if viv∗j ∈ S for all i 6= j; and

(iii) S-full if viv∗j ∈ S for all i, j ∈ [k].

Remark 3.2.2. (i) It is trivial to see that if a set is S-full then it is an S-clique.

(ii) If G is a graph and i1, . . . , in ⊆ V (G) is an independent set in G, then ei1 , . . . , ein

is SG-independent, and if i1, . . . , in ⊆ V (G) is a clique in G, then ei1 , . . . , ein is both an

SG-clique and an SG-full set.

(iii) Note that the singleton set u is both S-independent and an S-clique for any unit vector

u ∈ Cd and non-commutative graph S ⊆ Md, just as x is both an independent set and

clique of graph G for any vertex x ∈ V (G). However, for d > 1 and unit vector u ∈ Cd,

the set u is not S-full for every non-commutative graph S ⊆ Md. As a simple example,

consider the operator system CId = spanId. Let d > 1 and u ∈ Cd be a unit vector. Since

rank(uu∗) = 1, but rank(Id) = d, it is clear that uu∗ /∈ CId, and we can conclude that there

are no singleton CId-full sets for d > 1; indeed, there are no non-empty CId-full sets for d > 1.

Definition 3.2.3. For a given non-commutative graph S ⊆ Md, a projection P ∈ Md will

be called

(i) S-abelian if there exists an S-independent set viki=1 such that P =∑k

i=1 viv∗i ;

(ii) S-clique if there exists an S-clique viki=1 such that P =∑k

i=1 viv∗i ; and

(iii) S-full if there exists an S-full set viki=1 such that P =∑k

i=1 viv∗i .

Remark 3.2.4. (i) The choice of nomenclature in Definition 3.2.3 (i) is justified by the result

that projection P is S-abelian precisely when PSP is contained in an abelian C∗-subalgebra

of Md. This was proved by Vern I. Paulsen in an unpublished note, but details can be found

in [6].

(ii) The condition that projection P ∈Md is S-full for S ⊆Md is equivalent to the condition,

as stated in [6], that L(ran(P ))⊕ 0P⊥ ⊆ S where 0P⊥ denotes the zero operator on ran(P )⊥.

The equivalence is easy to see. Take P =∑k

i=1 viv∗i . First assume that v1, . . . , vk is S-full.

Then for any T ∈ L(ran(P ))⊕0P⊥ , we have T = TP =∑k

i=1(Tvi)v∗i with Tvi ∈ spanvi for

all i ∈ [k], and so T ∈ S, using that viv∗j ∈ S for all i, j ∈ [k]. Conversely, if L(ran(P ))⊕0P⊥ ⊆


S, noting that viv∗j ∈ L(ran(P ))⊕0P⊥ for all i, j ∈ [k], we have that viv

∗j ∈ S for all i, j ∈ [k],

and v1, . . . , vk is S-full.

The following proposition associates a number of convex corners with a non-commutative

graph S. We recall from Lemma 2.2.26 that, for a bounded subset G ⊆ M+d , it holds that

her(conv(G)) is an Md-convex corner which we denote by C(G).

Definition 3.2.5. For non-commutative graph S ⊆ Md we define the following Md-convex

corners, known as the abelian, clique and full projection convex corners of S respectively:

ap(S) = C(P : P is an S-abelian projection

),

cp(S) = C(P : P is an S-clique projection

),

fp(S) = C(P : P is an S-full projection

).

We give some simple properties of these convex corners.

Lemma 3.2.6. If S ⊆Md is a non-commutative graph, then fp(S) ⊆ cp(S).

Proof. Simply note from the definitions that every S-full projection is an S-clique projection.

Lemma 3.2.7. For any operator system S ⊆Md, it holds that

AId ⊆ ap(S) ⊆ BId , AId ⊆ cp(S) ⊆ BId and fp(S) ⊆ BId ,

where AId and BId are the Md-unit corner and Md-unit cube as defined in Section 2.4.3.

Proof. To show that AId is a subset of ap(S) and cp(S) first note that every element T ∈ AIdis of the form

∑di=1 µiviv

∗i , with µi ≥ 0 for all i ∈ [d] and

∑di=1 µi ≤ 1, and where v1, . . . , vd

is an orthonormal basis of Cd. We then use Remark 3.2.2 (iii), which shows that any rank

one projection is both S-abelian and S-clique, and the result follows by Definition 3.2.5.

Since P ≤ Id for any projection P ∈ M+d , and these convex corners are generated by sets of

projections, it is clear that they are contained in BId .

Remark 3.2.8. We note by Remark 3.2.2 (iii) that it is possible that no non-zero projection

is S-full; S = spanId is an example of this. We set fp(S) = 0 in this case. However, if

G is a graph then eie∗i is an SG-full projection for all i ∈ V (G) and 1

|V (G)|I ∈ fp(SG), giving

that fp(SG) is a standard convex corner.


Lemma 3.2.9. If operator systems S, T ⊆Md satisfy S ⊆ T then

ap(S) ⊇ ap(T ), cp(S) ⊆ cp(T ) and fp(S) ⊆ fp(T ).

Proof. For S ⊆ T , we have S⊥ ⊇ T ⊥, and the results follow immediately from Definitions

3.2.1 and 3.2.3.

We can now prove a first ‘quantum sandwich’ theorem concerning the convex corners we

have introduced. This will be seen to have important consequences later.

Theorem 3.2.10. Let S be a non-commutative graph. Then

ap(S) ⊆ cp(S)] ⊆ fp(S)].

Proof. The second inclusion follows from Lemmas 3.2.6 and 2.2.10. The first inclusion clearly

holds if every S-abelian projection lies in the Md-convex corner cp(S)]: we now show this

to be the case. Let ξiki=1 (resp. ηpmp=1) be an S-independent set (resp. an S-clique)

and P (resp. Q) the projection onto its span. By Lemma 2.2.32, it suffices to show that

Tr(PQ) ≤ 1. Since ξiξ∗j ∈ S⊥ for i 6= j, and ηpη

∗q ∈ S for p 6= q, we have that

0 = Tr(ξiξ∗j ηqη

∗p) = 〈ηq, ξj〉〈ξi, ηp〉 whenever i 6= j and p 6= q.

Thus,

i 6= j, p 6= q =⇒ 〈ξi, ηp〉 = 0 or 〈ξj , ηq〉 = 0. (3.13)

For each i ∈ [k], let

β(i) = p ∈ [m] : 〈ξi, ηp〉 = 0,

and let

α = i ∈ [k] : β(i) 6= [m].

We write β(i)c = [m]\β(i).

We note that

Tr(PQ) =k∑i=1

m∑p=1

Tr(ξiξ∗i ηpη

∗p) =

k∑i=1

m∑p=1

|〈ξi, ηp〉|2 =k∑i=1

∑p∈β(i)c

|〈ξi, ηp〉|2,

and distinguish three cases:


Case 1. α = ∅. Then β(i) = [m] for every i ∈ [k] and it follows that Tr(PQ) = 0 ≤ 1.

Case 2. |α| = 1. Say α = i0. Here β(i) = [m] for all i 6= i0. Then

Tr(PQ) =

k∑i=1

∑p∈β(i)c

|〈ξi, ηp〉|2 =∑

p∈β(i0)c

|〈ξi0 , ηp〉|2 ≤ 1, (3.14)

because the family ηpmp=1 is orthonormal and ‖ξi0‖ = 1.

Case 3. |α| > 1. Consider i, j ∈ [k] and p, q ∈ [m] such that i 6= j and p 6= q. Then by (3.13),

it holds either that p ∈ β(i) or q ∈ β(j). Then for i 6= j, we can conclude

(p, q) ∈ [m]× [m] : p 6= q ⊆ (β(i)× [m]) ∪ ([m]× β(j)) . (3.15)

Taking complements of both sides of (3.15) yields

β(i)c × β(j)c ⊆ (p, p) : p ∈ [m] whenever i 6= j. (3.16)

Suppose that |β(i)c × β(j)c| > 1 for some i, j with i 6= j. Then by (3.16) there exist

distinct p, p′ ∈ [m] such that

p, p′ ∈ β(i)c, and p, p′ ∈ β(j)c,

but then (p, p′) ∈ β(i)c × β(j)c, contradicting (3.16). Thus, |β(i)c × β(j)c| ≤ 1 for all pairs

(i, j) with i 6= j. Since |α| > 1, it follows that |β(i)c| = 1 for every i ∈ α. For each i ∈ α we

write β(i)c = pi. Then (pi, pj) ∈ β(i)c × β(j)c for all i, j ∈ α with i 6= j. In view of (3.16),

it holds that pi = pj for all i, j ∈ α. Let p0 be the common value of pi for all i ∈ α. Then,

Tr(PQ) =k∑i=1

∑p∈β(i)c

|〈ξi, ηp〉|2 =∑i∈α|〈ξi, ηp0〉|2 ≤ 1,

because the family ξiki=1 is orthonormal and ‖ηp0‖ = 1. This completes the proof.

3.2.2 Embedding the classical in the quantum setting

Here the aim is to show that the Md-convex corners just introduced can be seen as quantum

generalisations of Rd-convex corners used in the classical setting. Recall that associated to

a graph G with d vertices are the Rd-convex corners VP(G) and FVP(G) = VP(G)[. We


will find it convenient to consider the associated diagonal convex corners vp(G) = φ(VP(G))

and fvp(G) = φ(FVP(G)) where φ is the canonical mapping Rd+ → D+d as in (2.1). Also

associated to graph G is the operator system SG ⊆Md as in Definition 3.1.31. In the results

below, we use notation introduced in Section 2.4.2. Recall that matrix M ∈ Md is doubly

stochastic if each row and column is a probability distribution.

Theorem 3.2.11. Let G be a graph with vertex set X = [d]. Then

(i) ∆(ap(SG)) = Dd ∩ ap(SG) = vp(G);

(ii) ∆(cp(SG)) = Dd ∩ cp(SG) = vp(G);

(iii) ∆(fp(SG)) = Dd ∩ fp(SG) = vp(G).

Proof. (i) Let S be an independent set in G. Then exe∗y ∈ S⊥G for distinct x, y ∈ S, and hence

ex : x ∈ S is an SG-independent set and∑

x∈S exe∗x ∈ Dd is an SG-abelian projection.

Since Dd ∩ (ap(SG)) is a convex set, this implies vp(G) ⊆ Dd ∩ ap(SG) ⊆ ∆(ap(SG)).

The proof of (i) is completed by showing that ∆(ap(SG)) ⊆ vp(G). Let v1, . . . , vm be

an SG-independent set and let P =∑m

i=1 viv∗i . Write vi =

∑x∈X λ

(i)x ex and a

(i)x =

∣∣∣λ(i)x

∣∣∣2 for

i ∈ [m] and x ∈ X. For i 6= j we have viv∗j =

∑x,y λ

(i)x λ

(j)y exe

∗y ∈ S⊥G . Thus for i 6= j,

a(i)x a

(j)y 6= 0 =⇒ x 6' y in G. (3.17)

Now

P =

m∑i=1

∑x,y∈X

λ(i)x λ

(i)y exe

∗y,

and so

∆(P ) =∑x∈X

m∑i=1

a(i)x exe

∗x. (3.18)

It is clear that m ≤ d since S-independent sets are orthonormal, and first we consider the

case that m = d. Here P = I = ∆(P ), and so (3.18) implies that

m∑i=1

a(i)x = 1 for all x ∈ X. (3.19)

Setting mi,x = a(i)x ≥ 0 for i, x ∈ [d], we have that

∑x∈X

mi,x =∑x∈X

a(i)x = ‖vi‖2 = 1, (3.20)


and thatd∑i=1

mi,x =d∑i=1

a(i)x = 1

by (3.19). Thus the matrix M = (mi,x) ∈Md is doubly stochastic.

Now we consider the case that m < d. Here P ≤ I, and so ∆(P ) ≤ I. Then for each

x ∈ X, (3.18) gives that∑m

i=1 a(i)x ≤ 1. For each x ∈ X, let rx = 1 −

∑mi=1 a

(i)x and observe

that rx ≥ 0. For each x ∈ X set

mi,x = a(i)x if 1 ≤ i ≤ m,

mi,x =rx

d−mif m+ 1 ≤ i ≤ d,

and let M = (mi,x) ∈Md. Note first that if 1 ≤ i ≤ m then, as in (3.20),

∑x∈X

mi,x =∑x∈X

a(i)x = 1.

On the other hand, d−∑

x∈X rx =∑m

i=1

∑x∈X a

(i)x = m and hence, if m+ 1 ≤ i ≤ d we have

∑x∈X

mi,x =∑x∈X

rxd−m

= 1.

Finally, if x ∈ X thend∑i=1

mi,x =m∑i=1

a(i)x + rx = 1,

and the matrix M = (mi,x) ∈Md is doubly stochastic.

The Birkhoff–von Neumann theorem states that a doubly stochastic matrix can be ex-

pressed as a convex combination of permutation matrices [41]. Thus, for a given matrix M

as formed above in either the case m = d or the case m < d, there exist γ1, . . . , γl > 0 and

permutation matrices P (k) =(p

(k)i,x

)i,x∈[d]

∈Md for k = 1, . . . , l, such that∑l

k=1 γk = 1 and

M =l∑

k=1

γkP(k). (3.21)

Set

Qk =∑x∈X

m∑i=1

p(k)i,x exe

∗x. (3.22)


For x ∈ X and 1 ≤ i ≤ m, by (3.21) we have

a(i)x = mi,x =

l∑k=1

γkp(k)i,x , (3.23)

and so (3.18) implies

∆(P ) =l∑

k=1

γkQk. (3.24)

Observe from (3.22) that Qk ∈ Dd and has diagonal entries in 0, 1. Suppose that

〈ex, Qkex〉 = 〈ey, Qkey〉 = 1 for distinct x, y ∈ X. Then there exist i, j ∈ [m] such that

p(k)i,x = p

(k)j,y = 1. Since P (k) is a permutation matrix, i 6= j. By (3.23), a

(i)x 6= 0 and a

(j)y 6= 0,

and then by (3.17), x and y are non-adjacent vertices in G. Thus, each Qk is a projection

in vp(G), and (3.24) implies that ∆(P ) ∈ vp(G). It follows by convexity that ∆(T ) ∈

vp(G) whenever T ∈ convP : P is SG-abelian. Then since vp(G) is closed, ∆(T ) ∈ vp(G)

whenever T ∈ convP : P is SG-abelian. Finally, note that if T ∈ ap(SG) = her(convP :

P is SG-abelian), then T ≤ S for some S ∈ convP : P is SG-abelian. This gives ∆(T ) ≤

∆(S) ∈ vp(G), and so ∆(T ) ∈ vp(G) by hereditarity, completing the proof of (i).

(ii),(iii) Let K be an independent set in G, that is, a clique in G. Then exe∗y ∈ SG for all

x, y ∈ K and so∑

x∈K exe∗x ∈ Dd is an SG-full projection. Together with Lemma 3.2.6, this

implies

vp(G) ⊆ Dd ∩ fp(SG) ⊆ Dd ∩ cp(SG) ⊆ ∆(cp(SG))

and

vp(G) ⊆ Dd ∩ fp(SG) ⊆ ∆(fp(SG)) ⊆ ∆(cp(SG)).

To complete the proof it will suffice to show that ∆(cp(SG)) ⊆ vp(G). Let v1, . . . , vm

be an SG-clique and P =∑m

i=1 viv∗i be the corresponding SG-clique projection. We have that

viv∗j ∈ SG for i 6= j. Writing vi =

∑x∈X λ

(i)x ex, as in the proof of (i), it holds analogously

to (3.17) that if i 6= j and λ(i)x λ

(j)y 6= 0, then x ' y in G. Following the method of proof of

(i), we now obtain that ∆(P ) ∈ vp(G), and consequently that ∆(cp(SG)) ⊆ vp(G), and the

proof is complete.

Corollary 3.2.12. Let G be a graph with vertex set X = [d]. Then

(i) ∆(ap(SG)]

)= Dd ∩ ap (SG)] = vp(G)[;

(ii) ∆(cp(SG)]

)= Dd ∩ cp (SG)] = fvp(G);


(iii) ∆(fp(SG)]

)= Dd ∩ fp (SG)] = fvp(G).

Proof. This is immediate from Corollary 2.4.14 and Theorem 3.2.11, where we recall that

fvp(G) = vp(G)[.

Corollary 3.2.13. Let graph G have associated operator system SG. Then

(i) her(vp(G)) ⊆ ap(SG) ⊆ (vp(G)[)];

(ii) her(vp(G)) ⊆ fp(SG) ⊆ cp(SG) ⊆ (fvp(G))];

(iii) her(vp(G)[) ⊆ ap(SG)] ⊆ (vp(G))];

(iv) her(fvp(G)) ⊆ cp(SG)] ⊆ fp(SG)] ⊆ (fvp(G)[)].

Proof. We can apply Theorem 2.4.19 to Theorem 3.2.11 and Corollary 3.2.12. We have also

used Lemma 3.2.6 and Theorem 1.2.15.

Remark 3.2.14. We will see shortly that the first inclusion in Corollary 3.2.13 (ii) and the

third inclusion in Corollary 3.2.13 (iv) are in fact equalities.

The previous results, in particular Theorem 3.2.11 and Corollary 3.2.12, suggest that

ap(S) should be regarded as a non-commutative generalisation of VP(G), and that cp(S)]

and fp(S)] generalise FVP(G). We note for a graph G that it is possible to define the Rd-

convex corner CP(G) = VP(G), the convex hull of the characteristic vectors of cliques in G,

and we can thus regard cp(S) and fp(S) as non-commutative generalisations of CP(G). Given

the increased complexity of the quantum setting, it should not surprise us that a classical

object may have more than one quantum generalisation. (We will see this phenomenon again

in the case of the Lovasz number, θ(G).) It is inviting to compare cp(S) and fp(S). Lemma

3.2.6 showed in the general case that fp(S) ⊆ cp(S); we now narrow the focus to operator

systems associated with graphs. Theorem 3.2.11 shows that convex corners ap(SG), cp(SG)

and fp(SG) have the same diagonal expectations, and the same diagonal elements. However,

these convex corners are not in general equal, as we now see.

Proposition 3.2.15. Let G be a graph on d vertices. Then ap(SG) = cp(SG) = fp(SG) if

and only if G = Kd.

Proof. Let V (G) = [d]. If G = Kd, it is clear that SG = Md, and e1, . . . , ed is an SG-

independent set, an SG-clique and an SG-full set. Thus Id =∑d

i=1 eie∗i is an SG-abelian


projection, an SG-clique projection and an SG-full projection, and

ap(SKd) = cp(SKd) = fp(SKd) = BId . (3.25)

If G 6= Kd, then there exist i, j ∈ [d] such that i 6' j in G, and we set unit vector v =

1√2(ei + ej), giving

vv∗ =1

2(eie

∗i + eje

∗j + eie

∗j + eje

∗i ). (3.26)

Now consider an SG-full set v1, . . . , vk with associated SG-full projection given by P =∑kl=1 vlv

∗l . For l = 1, . . . , k we set vl =

∑dr=1 α

(l)r er with coefficients α

(l)r ∈ C. Then

vlv∗m =

∑r,s∈[d]

α(l)r α

(m)s ere

∗s ∈ SG for all l,m ∈ [k],

but by assumption eie∗j /∈ SG. Thus for all l,m ∈ [k] we have α

(l)i α

(m)j = 0 and either

α(l)i = 0 for all l ∈ [k], or α

(m)j = 0 for all m ∈ [k]. Using that P =

∑kl=1 vlv

∗l where

vlv∗l =

∑r,s∈[d] α

(l)r α

(l)s ere

∗s, it then holds that

⟨P, eie

∗j

⟩=∑

l∈[k] α(l)i α

(l)j = 0 and that ei-

ther 〈P, eie∗i 〉 =∑

l∈[k] |α(l)i |2 = 0 or

⟨P, eje

∗j

⟩=∑

l∈[k] |α(l)j |2 = 0. Since P ≤ I we have

〈ei, P ei〉 ≤ 1 and 〈ej , P ej〉 ≤ 1. Letting Pf (SG) denote the set of all SG-full projections, it is

then clear for all A ∈ conv(Pf (SG)) that

〈ei, Aei〉+ 〈ej , Aej〉 ≤ 1 and 〈ei, Aej〉 = 〈ej , Aei〉 = 0.

Given the form of vv∗ in (3.26) and the fact that

1

2

1 1

1 1

6≤a 0

0 1− a

when a ∈ [0, 1],

it is easy to see that vv∗ 6≤ A for all A ∈ conv(Pf (SG)), and we conclude vv∗ /∈ fp(SG).

However, by Remark 3.2.2 (iii), vv∗ ∈ ap(SG) and vv∗ ∈ cp(SG), and the proof is complete.

Remark 3.2.16. We note that if G is not complete, it could still hold that ap(SG) = cp(SG);

showing when this equality holds is an open problem.

It was shown in Lemma 2.4.17 for a standard diagonal convex corner A that the strict

inclusion her(A) ( her(A[)] = (A[)] holds. The following two lemmas show that when graph


G is neither empty nor complete, Corollary 3.2.13 (i) becomes

her(vp(G)) ( ap(SG) ( (vp(G)[)].

Lemma 3.2.17. It holds that ap(SG) = her(vp(G)) if and only if G is empty.

Proof. Consider the empty graph Kd. By (3.25), ap(SKd) = BId , and as Id ∈ vp(Kd) we

have vp(Kd) = M ∈M+d ∩ Dd : M ≤ Id, giving that

her(vp(Kd)) = BId . (3.27)

Conversely, suppose that G is non-empty with i ∼ j in G. Let v be the unit vector 1√2(ei+ej).

All rank one projections are SG-abelian, so certainly vv∗ ∈ ap(SG). Suppose that vv∗ ≤ Q ∈

vp(G). Let Q = (qij) =∑

k µkPk where µk > 0 and∑

k µk = 1, and where for the independent

set Sk ⊆ V (G) we set Pk =∑

i∈Sk eie∗i . Since Q ≥ vv∗, we have qii ≥ 1/2 and qjj ≥ 1/2. As

i ∼ j in G, no independent set Sk contains both i and j, and we must have qii = qjj = 1/2.

Then e∗i (Q − vv∗)ei = e∗j (Q − vv∗)ej = 0 and, since Q − vv∗ ≥ 0, Corollary 2.1.2 gives that

e∗i (Q − vv∗)ej = 0. But Q is diagonal and hence e∗i (Q − vv∗)ej = −e∗i vv∗ej = −1/2. From

this contradiction we conclude vv∗ /∈ her(vp(G)) and hence her(vp(G)) 6= ap(SG).

Lemma 3.2.18. It holds that ap(SG) = (vp(G)[)] if and only if G is complete.

Proof. We have SKd = Md and so the SKd-abelian projections are precisely the rank one

projections. Thus ap(SKd) = AId . The independent sets of Kd are ∅ and the singletons, so

vp(Kd) = M ∈M+d ∩ Dd : TrM ≤ 1, giving

(vp(Kd)[)] = M ∈M+

d : ∆(M) ∈ vp(Kd) = AId , (3.28)

where we have used Lemma 2.4.16.

Conversely, suppose that graph G on d vertices is not complete and that k 6' l in G. Let

A = eke∗k+eke

∗l +ele

∗k+ele

∗l ≥ 0. Note that e∗k(I−A)ek = 0 but e∗l (I−A)ek = −1, and so by

Corollary 2.1.2, I −A 6≥ 0. Since ap(SG) ⊆ M ∈M+d : M ≤ I, it follows that A /∈ ap(SG).

However, k 6' l in G and so ∆(A) = eke∗k + ele

∗l ∈ vp(G). Then A ∈ (vp(G)[)] by Lemma

2.4.16, and then ap(SG) 6= (vp(G)[)].

We now examine cp(SG) and fp(SG) in the same way.


Lemma 3.2.19. For a graph G we have:

(i) cp(SG) = (vp(G)[)] if and only if G is empty;

(ii) cp(SG) = her(vp(G)) if and only if G is complete;

(iii) fp(SG) = her(vp(G)) for every graph G.

Proof. (i) We claim that cp(SKd) = M ∈M+

d : TrM ≤ 1. Since every rank one projection

lies in cp(SKd), it suffices to show that if rank(P ) > 1 for projection P , then P /∈ cp(SKd

). To

establish this, suppose there exists an SKd- clique projection P with rankP ≥ 2. Then there

exist orthonormal u = (ui), v = (vi) ∈ Cd such that uv∗ ∈ SKd= Dd. Suppose ui 6= 0. Then,

since 〈u, v〉 = 0, it is clear that vj 6= 0 for some j 6= i. This, however, gives⟨eie∗j , uv

∗⟩6= 0,

contradicting that uv∗ ∈ Dd. Then, recalling (3.28), we see that cp(SKd) = (vp(Kd)

[)]. Now,

say G is non-empty and k ∼ l in G. As in Lemma 3.2.18, we let A = eke∗k +eke

∗l +ele

∗k +ele

∗l ,

giving ∆(A) ∈ vp(G) and A ∈ (vp(G)[)] by Lemma 2.4.16. However by Lemma 3.2.7 we

have A /∈ cp(SG) because A 6≤ I. This proves (i).

(ii) From (3.25) and (3.27), cp(SKd) = her(vp(Kd)). To complete the proof of (ii) consider

non-complete G in which i 6' j. Let v be the unit vector 1√2(ei+ej). As a rank one projection,

vv∗ ∈ cp(SG), but, using the argument of Lemma 3.2.17, vv∗ /∈ her(vp(G)).

(iii) By Corollary 3.2.13 (ii), her(vp(G)) ⊆ fp(SG) and we now show the reverse inclusion. Let

v1, . . . , vr be SG-full and P =∑r

i=1 viv∗i . Set vi =

∑j λ

(i)j ej with λ

(i)j ∈ C. Now viv

∗j ∈ SG

for all i, j ∈ [r], and so if λ(i)l λ

(j)k 6= 0 for some i, j ∈ [r], then ele

∗k ∈ SG and l ' k in G. We

conclude that the set SQ ⊆ V (G) defined by SQ = j : λ(k)j 6= 0 for some k is a clique in

G. Then, defining Q :=∑

j∈SQ eje∗j , we have Q ∈ vp(G). By the definition of SQ we have

v1, . . . , vr ∈ spanej : j ∈ SQ. Thus

ran(P ) = spanvi : i ∈ [r] ⊆ spanej : j ∈ SQ = ran(Q),

and P ≤ Q ∈ vp(G) and so P ∈ her(vp(G)). Now vp(G) is closed and convex, and this

fact and Lemma 2.2.26 then give that her(vp(G)) is an Md-convex corner and so fp(SG) ⊆

her(vp(G)), as required.

Corollary 3.2.20. For a graph G we have:

(i) ap(SG)] = vp(G)] if and only if G is empty;


(ii) ap(SG)] = her(vp(G)[) if and only if G is complete;

(iii) cp(SG)] = her(vp(G)[) if and only if G is empty;

(iv) cp(SG)] = vp(G)] if and only if G is complete;

(v) fp(SG)] = vp(G)] for every graph G.

Proof. These follow immediately by anti-blocking the results in Lemmas 3.2.17, 3.2.18 and

3.2.19, and using Lemma 2.2.32 and Corollary 2.3.15.

Remark 3.2.21. In [11] it is noted that graph G is perfect if and only if vp(G) = fvp(G);

for a proof see [9, Theorem 3.1]. Now, by Theorem 3.2.11 (i), vp(G) = Dd ∩ ap(SG), and

Corollary 3.2.12 (iii) gives that fvp(G) = Dd ∩ fp (SG)] . Thus G is perfect if and only if

Dd ∩ ap(SG) = Dd ∩ fp(SG)]. It is interesting to note, however, that this condition is not

equivalent to ap(SG) = fp(SG)]. Indeed, ap(SG) = fp(SG)] if and only if G is complete. To see

this, note first by the observation above that whenG is not perfect, we have ap(SG) 6= fp(SG)].

Then recall from Corollary 3.2.20 that fp(SG)] = vp(G)] = (fvp(G)[)], and so when G is

perfect we have fp(SG)] = (vp(G)[)]. However, Lemma 3.2.18 gives that ap(SG) = (vp(G)[)]

if and only if G is complete.

3.2.3 The Lovasz corner

We now wish to generalise TH(G), the theta corner of graph G as described in Definition

1.4.3, to the non-commutative setting. In this section we introduce the Md-convex corner

th(S) associated with non-commutative graph S ⊆Md, and show that it can be seen as such

a generalisation. We also give some basic properties.

We write th(G) for the diagonal convex corner φ(TH(G)). For graph G with vertex set

X = [d] we let P0(G) = φ(C(G)) where φ : Rd+ → D+d is given in (2.1) and C(G) is given in

(1.34). By Lemma 2.2.7 and Definition 1.4.3,

th(G) = P0(G)[. (3.29)

We recall the following standard definition.

Definition 3.2.22. [54, Equation 1.112] Given linear map Φ : Md →Mk we define the linear

map Φ∗ : Mk →Md, called the adjoint of Φ, by

〈T,Φ∗(S)〉 = 〈Φ(T ), S〉 for all T ∈Md and S ∈Mk.


Ii is easy to see that if Φ : Md → Mk is a quantum channel with Kraus operators

A1, . . . , Ar ∈Mk,d, then Φ∗(σ) =∑r

i=1A∗iσAi for σ ∈Mk ([54, (2.69)]). Furthermore, if Φ is

positive, then Φ∗ is positive, and if Φ is completely positive, then Φ∗ is completely positive

([54, Proposition 2.18]). Note that if Φ is trace-preserving, it does not in general follow that

Φ∗ is trace-preserving. When Φ : Md → Mk is a quantum channel with Kraus operators

A1, . . . , Ar ∈ Mk,d, we have Φ∗(Ik) =∑r

i=1A∗iAi = Id and we say that Φ∗ is unital. (This

shows that when Φ : Md → Mk is a quantum channel with d 6= k, the adjoint channel Φ∗

cannot be trace-preserving.)

Let S ⊆ Md be an operator system. We make the following definitions where SΦ is as

defined in Definition 3.1.26:

C(S) = Φ : Md →Mk : Φ is a quantum channel with SΦ ⊆ S, k ∈ N ; (3.30)

th(S) =T ∈M+

d : Φ(T ) ≤ I for every Φ ∈ C(S)

; (3.31)

and

P(S) = Φ∗(σ) : Φ ∈ C(S), σ ≥ 0, Tr(σ) ≤ 1 . (3.32)

Remark 3.2.23. (i) We call th(S) the Lovasz corner of operator system S.

(ii) Note that for S ⊆Md, the set C(S) can contain quantum channels Md →Mk for infinitely

many k ∈ N.

(iii) In (3.32) for each choice of Φ ∈ C(S) with Φ : Md → Mk, it is obviously assumed that

we choose σ ∈M+k . This also applies in the proof of the lemma below, our ‘quantum version’

of (3.29).

Lemma 3.2.24. Let S ⊆ Md be an operator system. Then th(S) is a convex corner and

th(S) = P(S)].

Proof. For T ∈M+d , we have

T ∈ th(S) ⇔ Φ(T ) ≤ I for all Φ ∈ C(S)

⇔ 〈Φ(T ), σ〉 ≤ 1 for all Φ ∈ C(S) and all σ ≥ 0, Tr(σ) ≤ 1

⇔ 〈T,Φ∗(σ)〉 ≤ 1 for all Φ ∈ C(S) and all σ ≥ 0, Tr(σ) ≤ 1

⇔ T ∈ P(S)].


To see the second equivalence, note by Lemma B.0.2 (v) that 〈Φ(T ), σ〉 = Tr(Φ(T )σ) ≤

Tr(σ) ≤ 1 when Φ(T ) ≤ I and Tr(σ) ≤ 1. Conversely, if Φ(T ) 6≤ I then Φ(T ) has a unit

eigenvector v such that 〈Φ(T ), vv∗〉 > 1. Having established that th(S) = P(S)], Lemma

2.2.10 gives that th(S) is a convex corner.

For a general operator system S ⊆ Md, one of the apparent difficulties in working with

th(S) is that the set C(S) used in its definition contains quantum channels Φ : Md → Mk

with no upper bound on k. However, the following results lead us to Corollary 3.2.30, which

shows that only quantum channels Φ : Md →Md2 need to be considered in order to determine

th(S).

For an operator system S ⊆Md and for fixed k ∈ N, let

Ck(S) = Φ : Md →Mk : Φ ∈ C(S) , (3.33)

thk(S) = T ∈M+d : Φ(T ) ≤ Ik for every Φ ∈ Ck(S), (3.34)

and

Pk(S) = Φ∗(σ) : Φ ∈ Ck(S), σ ∈M+k , Tr(σ) ≤ 1. (3.35)

As in Lemma 3.2.24, one can see that

Pk(S)] = thk(S), (3.36)

and by Lemma 2.2.10 it follows that thk(S) is a convex corner; we call thk(S) the kth Lovasz

corner of S. It is clear that th(S) = ∩k∈N thk(S). Furthermore, the following lemma shows

that the sets thk(S) are nested.

Lemma 3.2.25. For operator system S ⊆Md and k ∈ N, we have

thk+1(S) ⊆ thk(S).

Proof. Consider Φ ∈ Ck(S) having Kraus representation

Φ(ρ) =

q∑p=1

ApρA∗p for ρ ∈Md, with Ap ∈Mk,d,

q∑p=1

A∗pAp = Id.

For ρ ∈ M+d , write Φ(ρ) = (ϕ(ρ)ij)i,j∈[k] and let mapping Φ′ : Md → Mk+1 be given by


Φ′(ρ) = (ϕ′(ρ)ij)i,j∈[k+1], where

ϕ′(ρ)ij =

ϕ(ρ)ij when i, j ∈ [k],

0 otherwise.

Letting the operator A′p ∈Mk+1,d be formed by giving Ap a (k+ 1)th row consisting entirely

of zeros, it is easy to see that

Φ′(ρ) =

q∑p=1

A′pρA′∗p . (3.37)

It is also clear that A′∗nA′m = A∗nAm for all n,m ∈ [q], and thus

∑qp=1A

′∗p A′p = Id, and

Φ′ is a c.p.t.p. map satisfying SΦ′ = SΦ. Thus (3.37) is a Kraus representation of Φ′ ∈

Ck+1(S). Clearly Φ(T ) ≤ Ik if and only if Φ′(T ) ≤ Ik+1, and the inclusion thk+1(S) ⊆ thk(S)

follows.

Lemma 3.2.26. Let S ⊆ Md be a non-commutative graph, k ∈ N, and Φ ∈ Ck(S). Suppose

that Φ =∑n

i=1 tiΦi where Φi : Md → Mk is a quantum channel for each i = 1, . . . , n and

ti ∈ (0, 1) with∑n

i=1 ti = 1. It follows that Φi ∈ Ck(S) for all i = 1, . . . , n.

Proof. For the case that n = 2, let Aimi=1 and Bplp=1 be families of Kraus operators for

Φ1 and Φ2 respectively. Then it is easy to verify that√

t1Ai,√t2Bp : i ∈ [m], p ∈ [l]

is a

family of Kraus operators for Φ and so A∗iAj ∈ SΦ ⊆ S and B∗pBq ∈ SΦ ⊆ S for all i, j ∈ [m]

and all p, q ∈ [l]. Thus, SΦi ⊆ S and Φi ∈ Ck(S) for i = 1, 2. A routine induction argument

completes the proof for general n ∈ N.

Lemma 3.2.27. For an operator system S ⊆Md the set Ck(S) is compact for each k ∈ N.

Proof. Consider completely positive maps Ψ,Φ : Md →Mk. The definition of the Choi matrix

as given in Proposition 3.1.17 shows that PkΦ = kPΦ for k ∈ R+ and that PΦ+Ψ = PΦ + PΨ,

and so the map Φ → PΦ is linear. It is also clear the map J : Ck(S) → PΦ : Φ ∈ Ck(S)

defined by J(Φ) = PΦ is bijective, and so by standard analysis the compactness of Ck(S)

is equivalent to that of B where B = PΦ : Φ ∈ Ck(S) ([43, Theorem 4.14]). Given that

Md(Mk) is of finite dimension, by standard theory, see for example [31, Corollary 1.4.21], it

suffices to show that B is both closed and bounded. For the boundedness of B recall for all

Φ ∈ Ck(S) that PΦ ≥ 0 and Tr(PΦ) = d, and hence ‖PΦ‖ ≤ d.

Consider the sequence (Ψn)n∈N ⊆ Ck(S) satisfying PΨn → Q. To show that B is closed it

is required to show that Q = PΨ for some Ψ ∈ Ck(S). Let Ψn have the canonical Kraus

representation Ψn(ρ) =∑kd

i=1A(n)i ρA

(n)∗i , constructed as in Remark 3.1.22. Because of


how the operators A(n)i are constructed from the eigenvalues and eigenvectors of PΦn , it

is clear thatA

(n)i : i ∈ [kd], n ∈ N

is bounded in Hilbert–Schmidt norm. Then the se-

quence((A

(n)1 , . . . , A

(n)kd

))n∈N

has a convergent subsequence corresponding to a subsequence

of channels (Ψni)i∈N. Thus for each j ∈ [kd] we have Aj ∈ Mk,d such that A(ni)j → Aj as

i→∞. Since Ψni ∈ Ck(S) for all i ∈ N,

A(ni)∗p A(ni)

q ∈ S for all i ∈ N, p, q ∈ [kd],

and alsokd∑p=1

A(ni)∗p A(ni)

p = I for all i ∈ N.

Then by continuity and because S is closed, we have

kd∑p=1

A∗pAp = I, and A∗pAq ∈ S for all p, q ∈ [kd].

Finally, setting Ψ(ρ) =∑kd

j=1AjρA∗j , it is clear that PΨn → PΨ with Ψ ∈ Ck(S).

Since Md is an operator system, for k, d ∈ N we observe that Ck(Md) is the set of all

quantum channels from Md to Mk. By Lemma 3.2.27, Ck(Md) is compact. For c.p.t.p maps

Φ1,Φ2 : Md → Mk and for any t ∈ (0, 1), it is trivial to see that Φ = tΦ1 + (1 − t)Φ2 is a

c.p.t.p. map, and we conclude that Ck(Md) is convex.

Letting Ek be the set of all extreme points in Ck(Md), Theorem A.0.2 then gives that

Ck(Md) = conv(Ek). (3.38)

Remark 3.2.28. We note that Ck(S) is not convex for every operator system S. For example,

consider the ‘constant diagonal’ operator system S2 defined by

S2 = spanI, e1e∗2, e2e

∗1 ⊆M2.

(Not being equal to SG for any graph G, the operator system S2 is one of the simplest

to exhibit non-classical properties, and will be examined in more detail later.) Now let


Φ : M2 →M3 be the quantum channel with Kraus operators

A1 =1√2

0 0

1 0

0 1

and A2 =1√2

0 1

0 0

1 0

,

and let Ψ : M2 →M3 be the quantum channel with Kraus operators

B1 =1√2

0 0

0 1

1 0

and B2 =1√2

0 1

1 0

0 0

.

It is easy to verify that Φ,Ψ ∈ C3(S2). By the argument in the proof of Lemma 3.2.26, for

a ∈ (0, 1) the convex combination aΦ + (1− a)Ψ has Kraus operators

√aA1,

√aA2,

√1− aB1,

√1− aB2

.

We then note that

√aA∗1√

1− aB2 =

√a(1− a)

2

1 0

0 0

/∈ S2,

and we conclude that aΦ + (1− a)Ψ /∈ C3(S2), and C3(S2) is not convex.

Proposition 3.2.29. Let S ⊆ Md be an operator system and k ≥ d2. Let Ek denote the set

of all extreme points in Ck(Md). Then

thd2(S) = thk(S) =T ∈M+

d : Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek. (3.39)

Proof. First we show for k ∈ N that

thk(S) =T ∈M+

d : Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek. (3.40)

To see this, denote the right hand side of (3.40) by R and note that it is trivial that thk(S) ⊆

R. To show the reverse inclusion, use (3.38) to write Φ ∈ Ck(S)\Ek as Φ =∑l

p=1 tpΦp

with Φp ∈ Ek, tp ∈ (0, 1), and∑l

p=1 tp = 1. Lemma 3.2.26 gives that Φp ∈ Ck(S) for

all p ∈ [l], and so Φp(T ) ≤ I for all p ∈ [l] and for all T ∈ R. Then if T ∈ R we have

Φ(T ) =∑l

p=1 tpΦp(T ) ≤ I and thus T ∈ thk(S), as claimed.

Now consider the case k ≥ d2. Lemma 3.2.25 gives thk(S) ⊆ thd2(S). To complete the

proof of (3.39) it must be shown that thk(S) ⊇ thd2(S). By (3.40) it is sufficient to show


that if T ∈ thd2(S), then Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek.

Now by [8, Theorem 5], for quantum channel Ψ ∈ Ek there exists a Kraus representation

Ψ(σ) =m∑i=1

AiσA∗i , Ai ∈Mk,d (3.41)

such that the set S = A∗iAj : i, j ∈ [m] ⊆Md is linearly independent. Then |S| ≤ dim(Md)

and so m ≤ d. Let P ∈ Mk be the projection onto the span of the ranges of the operators

A1, . . . , Am. Then since the range of a matrix is the span of its columns and each Ai ∈Mk,d,

it holds that rank(P ) ≤ md ≤ d2. Write P =∑p

i=1 viv∗i ∈ Mk for some orthonormal set

v1, . . . , vp ⊆ Ck with p ≤ d2. It is clear that PAi = Ai and A∗iP = A∗i for all i ∈ [m], and so

PΨ(σ)P = Ψ(σ) for σ ∈Md. We now show how to form a related channel Ψ′ : Md →Md2 . Let

e1, . . . , ed2 denote the canonical orthonormal basis of Cd2and let Q =

∑pi=1 eiv

∗i ∈ Md2,k.

We have Q∗Q =∑p

i=1 viv∗i = P and QQ∗ =

∑pi=1 eie

∗i ≤ Id2 . Now let map Ψ′ : Md → Md2

be given by

Ψ′(ρ) = QΨ(ρ)Q∗ =m∑i=1

(QAi)ρ(QAi)∗, ρ ∈Md.

Note that the operators QAi ∈Md2,d satisfy

m∑i=1

(QAi)∗(QAi) =

m∑i=1

A∗iPAi =m∑i=1

A∗iAi = Id,

and so Ψ′ is a c.p.t.p. map with Kraus operators QA1, . . . , QAm. We now show that Ψ′ is

related to Ψ in the following ways.

(i) We observe that

SΨ′ = span(QAi)∗(QAj) = spanA∗iPAj = spanA∗iAj = SΨ,

and so

Ψ ∈ Ck(S) ∩ Ek ⇒ Ψ′ ∈ Cd2(S). (3.42)

(ii) We claim for T ∈M+d that

Ψ′(T ) ≤ Id2 ⇒ Ψ(T ) ≤ Ik. (3.43)

To establish this, suppose for T ∈ M+d that we have Ψ′(T ) ≤ Id2 , in other words that


0 ≤ 〈v,Ψ′(T )v〉 ≤ 〈v, v〉 for all v ∈ Cd2. For any unit vector u ∈ Ck we then have

0 ≤ 〈u,Ψ(T )u〉 = 〈u, PΨ(T )Pu〉 = 〈u,Q∗QΨ(T )Q∗Qu〉

=⟨Qu,Ψ′(T )Qu

⟩≤ 〈Qu,Qu〉 = 〈u, Pu〉 ≤ 1,

and Ψ(T ) ≤ Ik as required.

Then, if T ∈ thd2(S) and Ψ ∈ Ck(S) ∩ Ek, (3.42) gives that Ψ′(T ) ≤ Id2 and from (3.43)

we obtain Ψ(T ) ≤ Ik, as required to complete the proof.

Since th(S) =⋂k∈N thk(S), Proposition 3.2.29 and Lemma 3.2.25 immediately yield the

following corollary, showing, as promised, that only quantum channels Φ : Md → Md2 need

be considered to determine th(S) for operator system S ⊆Md.

Corollary 3.2.30. For operator system S ⊆Md it holds that th(S) = thd2(S).

The next results show how th(S) is indeed a generalisation of TH(G) by explaining how

the classical situation of a graph G with vertex set X = [d] embeds into the quantum setting.

We will call a family (Px)x∈X of projections in Mk for some k ∈ N a projective orthogonal

labelling (p.o.l.) of G acting on Mk if

x 6' y in G ⇒ PxPy = 0. (3.44)

We say(

(Px)x∈X , ρ)

is a handled projective orthogonal labelling (h.p.o.l.) of G acting on Mk

when (Px)x∈X is a p.o.l. of G acting on Mk and ρ ∈ Rk, the set of states in Mk. We define

the set P(G) ⊆ Dd by

P(G) =φ((Tr(Pxρ))x∈X

):((Px)x∈X , ρ

)is a h.p.o.l. of G

. (3.45)

Comparing to the theory in Section 1.4, we note that, if((ax)x∈X , c

)is a h.o.n.l. of G in Rk,

then((axa

∗x)x∈X , cc

∗) is a h.p.o.l. of G acting on Mk. Setting P0(G) = φ(C(G)) as on page

108, and recalling (1.34) on page 31, we obtain

P0(G) ⊆ P(G). (3.46)

Lemma 3.2.31. Let graph G have vertex set X = [d]. Then P(G) ⊆ P(G)[.

Proof. Let ((Px)x∈X , ρ) be a h.p.o.l. of G acting on Mk and ((Qx)x∈X , σ) be a h.p.o.l. of G


acting on Ml.

Then Px⊗Qx ∈Mk⊗Ml satisfies (Px⊗Qx)2 = (Px⊗Qx)∗ = Px⊗Qx is thus a projection

for all x ∈ X. By (3.44),

〈Px ⊗Qx, Py ⊗Qy〉 = 〈Px, Py〉〈Qx, Qy〉 = 0 for all distinct x, y ∈ X,

where we have used the fact that for distinct x, y ∈ X either x 6∼ y in G or x 6∼ y in G. As the

sum of mutually orthogonal projections in Mkl, it follows that the operator∑

x∈X Px⊗Qx ≤

Ikl. Then

∑x∈X

Tr(Pxρ) Tr(Qxσ) =∑x∈X

Tr((Px ⊗Qx)(ρ⊗ σ)

)= Tr

((∑x∈X

Px ⊗Qx

)(ρ⊗ σ)

)≤ Tr(ρ⊗ σ) = 1,

showing that φ ((Tr(Qxσ))x∈X) ∈ P(G)[, to complete the proof.

Suppose a graph G has vertex set [d]. We associate with an o.n.l. A = (ax)x∈[d] ⊆ Rk of

G the mapping ΦA : Md →Mk given by

ΦA(ρ) =∑x∈[d]

(axe∗x)ρ(exa

∗x) for ρ ∈Md. (3.47)

Since∑

x∈[d](exa∗x)(axe

∗x) =

∑x∈[d] exe

∗x = I, Proposition 3.1.18 shows that ΦA is a quantum

channel. Now (exa∗x)(aye

∗y) = 〈ay, ax〉exe∗y, which vanishes when x 6' y, and hence SΦA ⊆ SG

and ΦA ∈ C(SG).

Lemma 3.2.32. Let graph G have vertex set X = [d]. Then P0(G) ⊆ P(SG).

Proof. First note that an arbitrary element of P0(G) can be expressed as∑

x∈X | 〈ax, c〉 |2exe∗xfor some h.o.n.l. ((ax)x∈X , c) of G. Now let ΦA ∈ C(SG) be the channel defined in (3.47) for

the o.n.l. A = (ax)x∈X ⊆ Rk. Note that for σ ∈ Mk we have Φ∗A(σ) =∑

x∈X(exa∗x)σ(axe

∗x),

so for the unit vector c ∈ Rk, we have that

Φ∗A(cc∗) =∑x∈X|〈ax, c〉|2exe∗x ∈ P(SG),

as required.


Lemma 3.2.33. Let graph G have vertex set X = [d]. Then

P(G)[ ⊆ Dd ∩ th(SG).

Proof. For all T ∈ P(G)[ it is required to show that Φ(T ) ≤ I for all Φ ∈ C(SG). For

Φ ∈ Ck(SG) we can write

Φ(T ) =m∑p=1

ApTA∗p, T ∈Md,

where A1, . . . , Am ∈ Mk,d satisfy∑m

p=1A∗pAp = I. Set ap,x = Apex ∈ Ck for all p ∈ [m] and

x ∈ X, and for each x ∈ X set Zx =∑m

p=1 ap,xa∗p,x ∈ M+

k . Let Px ∈ Mk be the projection

onto span(ap,x : p ∈ [m]) and observe that

Zx = PxZxPx for all x ∈ X. (3.48)

For all p, q ∈ [m], we have A∗pAq ∈ SΦ ⊆ SG. Now suppose that x, y ∈ X satisfy x 6' y in

G so that exe∗y ∈ S⊥G . Then for all p, q ∈ [m],

〈aq,y, ap,x〉 = 〈Aqey, Apex〉 = 〈A∗pAq, exe∗y〉 = 0.

It follows that PxPy = 0, and the family (Px)x∈X is a p.o.l. of G. On the other hand,

‖Zx‖ ≤m∑p=1

‖ap,xa∗p,x‖ =m∑p=1

〈ap,x, ap,x〉 =m∑p=1

〈A∗pApex, ex〉 = 〈ex, ex〉 = 1, (3.49)

using that ‖vv∗‖ = 〈v, v〉 for v ∈ Ck. Relations (3.48) and (3.49) imply that

Zx ≤ Px, x ∈ X. (3.50)

Now consider a general element T =∑

x∈X txexe∗x ∈ P(G)[ with tx ∈ R+. It holds that

∥∥∥∥∥∑x∈X

txPx

∥∥∥∥∥ = max

⟨v,∑x∈X

txPxv

⟩: v ∈ Ck, ‖v‖ = 1

= max

Tr

((∑x∈X

txPx

)ρ

): ρ ∈ Rk

= max

∑x∈X

tx Tr(Pxρ) : ρ ∈ Rk

≤ 1,

where the inequality follows from the anti-blocking condition, using that ((Px)x∈X , ρ) is a


h.p.o.l. of G. Since∑

x∈X txPx ≥ 0 we have that∑

x∈X txPx ≤ Ik. Inequalities (3.50) now

imply that∑

x∈X txZx ≤ I, and so

Φ(T ) =m∑p=1

Ap

(∑x∈X

txexe∗x

)A∗p =

∑x∈X

tx

m∑p=1

(Apex)(Apex)∗

=∑x∈X

tx

m∑p=1

ap,xa∗p,x =

∑x∈X

txZx ≤ Ik.

Therefore T ∈ th(SG), and the proof is complete.

The next theorem makes clear the sense in which th(S) is a non-commutative version of

TH(G).

Theorem 3.2.34. Let graph G have vertex set X = [d]. Then

th(G) = Dd ∩ th(SG) = ∆(th(SG)).

Proof. By Definition 1.4.3, Lemma 2.2.7, (1.36), (3.46), (1.14), and Lemma 3.2.31,

P0(G)[ = φ(C(G)[) = th(G) = th(G)[ = P0(G)[[ ⊆ P(G)[[ ⊆ P(G)[[[.

Since P(G)[ is a diagonal convex corner, Lemma 2.2.7 gives that P(G)[[[ = P(G)[. Hence

P0(G)[ = th(G) ⊆ P(G)[ ⊆ P0(G)[, where the second inclusion follows from (1.14) and

(3.46). We can conclude that th(G) = P(G)[ and hence, by Lemma 3.2.33,

th(G) ⊆ Dd ∩ th(SG). (3.51)

Clearly Dd∩th(SG) ⊆ ∆(th(SG)), so the proof is completed by showing that ∆(T ) ∈ th(G)

when T ∈ th(SG). Suppose that Φ : Md →Mk is a quantum channel satisfying SΦ ⊆ SG and

having Kraus representation

Φ(T ) =

m∑p=1

ApTA∗p, T ∈Md.

For x ∈ X and p ∈ [m] set Ap,x = Ap(exe∗x). Now for p, q ∈ [m] we have A∗pAq ∈ SΦ ⊆ SG,

so we can write A∗pAq =∑

i,j∈[d] αijeie∗j with αij = 0 when i 6' j in G. For all x, y ∈ [d] this

gives

A∗p,xAq,y = (exe∗x)A∗pAq(eye

∗y) = αxyexe

∗y ∈ SG. (3.52)


Using that∑m

p=1A∗pAp = I yields

m∑p=1

∑x∈X

A∗p,xAp,x =m∑p=1

∑x∈X

(exe∗x)A∗pAp(exe

∗x) =

∑x∈X

exe∗x = I.

Thus, the map Ψ : Md →Mk, given for S ∈Md by

Ψ(S) =

m∑p=1

∑x∈X

Ap,xSA∗p,x,

is a quantum channel which by (3.52) satisfies

SΨ = spanA∗p,xAq,y : p, q ∈ [m], x, y ∈ [d] ⊆ SG;

thus we have Ψ ∈ C(SG). For T ∈ th(SG) it then holds that

Φ(∆(T )) =

m∑p=1

Ap

(∑x∈X

(exe∗x)T (exe

∗x)

)A∗p = Ψ(T ) ≤ I,

and it follows that ∆(T ) ∈ th(SG). Now by Lemmas 3.2.24 and 3.2.32, th(SG) = P(SG)] ⊆

P0(G)]. Then ∆(T ) ∈ Dd ∩ P0(G)] = P0(G)[ = th(G), where we used Definition 2.2.6 and

(3.29) on page 108 to complete the proof.

Lemma 3.2.35. If operator systems S, T ⊆Md satisfy S ⊆ T then

th(S) ⊇ th(T ).

Proof. For S ⊆ T , we have C(S) ⊆ C(T ), and the result follows from (3.31) on page 109.

It is easy to see that the identity channel Id : Md → Md given by Id(ρ) = ρ for all

ρ ∈Md has a Kraus representation Id(ρ) = IdρI∗d , and so has the associated operator system

SId = spanId. It is immediate for any non-commutative graph S ⊆Md that

SId ⊆ S and Id ∈ C(S). (3.53)

Lemma 3.2.36. For any operator system S ⊆Md,

AId ⊆ th(S) ⊆ BId .

Proof. If T ∈ th(S), then by (3.31) and (3.53), Id(T ) = T ≤ I. This proves the second


inclusion. For the first inclusion, consider U ∈M+d satisfying TrU ≤ 1. Since every quantum

channel is c.p.t.p., we have Tr(Φ(U)) ≤ 1 and Φ(U) ≥ 0 for all Φ ∈ C(S), and it thus holds

that Φ(U) ≤ I for all Φ ∈ C(S), giving U ∈ th(S) by (3.31).

3.2.4 A quantum sandwich theorem

Given that we have now formed Md-convex corners which generalise VP(G), FVP(G) and

TH(G) to the quantum setting, it is natural to ask if there is a quantum version of the well-

known ‘classical’ sandwich theorem given in Theorem 1.4.5. The following theorem answers

that question affirmatively.

Theorem 3.2.37. If S ⊆Md is an operator system, then

ap(S) ⊆ th(S) ⊆ fp(S)].

Proof. We begin by proving the first inclusion. Let P be an S-abelian projection, and suppose

that ξ1, . . . , ξk ⊆ Cd is an S-independent set such that P =∑k

i=1 ξiξ∗i . Consider Φ ∈ C(S)

with Kraus operators A1, . . . , Am, and let i, j ∈ [k] with i 6= j. Then

Tr(Φ(ξiξ

∗i )Φ(ξjξ

∗j ))

=m∑

p,q=1

Tr(Ap(ξiξ

∗i )A∗pAq(ξjξ

∗j )A∗q

)=

m∑p,q=1

Tr((Apξi)(Apξi)

∗(Aqξj)(Aqξj)∗)

=m∑

p,q=1

|〈Aqξj , Apξi〉|2

=

m∑p,q=1

∣∣⟨A∗pAq, ξiξ∗j ⟩∣∣2 = 0, (3.54)

where we have used that A∗pAq ∈ SΦ ⊆ S for all p, q ∈ [m], while ξiξ∗j ∈ S⊥. Since Φ(ξiξ

∗i ) ≥

0, for each i ∈ [k] we can write Φ(ξiξ∗i ) =

∑nil=1 λ

(i)l v

(i)l v

(i)∗l , where v

(i)1 , . . . , v

(i)ni are unit

eigenvectors of Φ(ξiξ∗i ) with the strictly positive eigenvalues λ

(i)1 , . . . , λ

(i)ni > 0 respectively.

For i 6= j, we apply (3.54) to obtain

Tr(Φ(ξiξ

∗i )Φ(ξjξ

∗j ))

=∑

l∈[ni], r∈[nj ]

λ(i)l λ

(j)r

∣∣∣⟨v(i)l , v(j)

r

⟩∣∣∣2 = 0,


and it then holds for i 6= j that⟨v

(i)l , v

(j)m

⟩= 0 for all l ∈ [ni] and m ∈ [nj ]. It follows that

‖Φ(P )‖ =

∥∥∥∥∥k∑i=1

Φ(ξiξ∗i )

∥∥∥∥∥ = maxi=1,...,k

‖Φ(ξiξ∗i )‖ . (3.55)

Since Tr(ξiξ∗i ) = 1 and Φ is trace preserving, Tr(Φ(ξiξ

∗i )) = 1, and so ‖Φ(ξiξ

∗i )‖ ≤ 1, for all

i ∈ [k]. Then by (3.55), ‖Φ(P )‖ ≤ 1, and since Φ ∈ C(S) was arbitrary, we conclude that

P ∈ th(S).

Since ap(S) is generated by the S-abelian projections and th(S) is an Md-convex corner,

by Lemma 2.2.29 it holds that ap(S) ⊆ th(S), and the first inclusion is proved.

To prove the second inclusion, suppose thatQ is an S-full projection, and let η1, . . . , ηk ⊆

Cd be an S-full set such that Q =∑k

j=1 ηjη∗j ∈ Md. Let η ∈ span(η1, . . . , ηk) be a unit

vector and write Q⊥ = I −Q, giving Qη = η, Q⊥η = 0 and η∗Q⊥ = 0. Now

k∑j=1

(ηη∗j )∗(ηη∗j ) + (Q⊥)∗Q⊥ =

k∑j=1

ηjη∗ηη∗j +Q⊥ =

k∑j=1

ηjη∗j +Q⊥ = I,

and so by Proposition 3.1.18, the mapping Φ : Md →Md given by

Φ(T ) = Q⊥TQ⊥ +k∑j=1

(ηη∗j )T (ηjη∗)

is a quantum channel with Kraus operators ηη∗1, . . . , ηη∗k, Q

⊥ ∈Md. It is easy to see that

(ηiη∗)(ηη∗j ) = ηiη

∗j ∈ S for all i, j ∈ [k],

Q⊥ηη∗i = ηiη∗Q⊥ = 0 ∈ S for all i ∈ [k],

(Q⊥)∗Q⊥ = Q⊥ = I −k∑j=1

ηjη∗j ∈ S.

It follows that SΦ ⊆ S and Φ ∈ C(S).

We then have

Φ∗(ηη∗) = Q⊥ηη∗Q⊥ +

k∑j=1

ηjη∗(ηη∗)ηη∗j =

k∑j=1

ηjη∗j = Q,

and (3.32) on page 109 shows that Q ∈ P(S). Since Q was an arbitrary S-full projection,

it is clear that Pf(S) ⊆ P(S) where Pf(S) is the set of all S-full projections. By Lemmas

3.2.24 and 2.2.10 it then holds that th(S) = P(S)] ⊆ Pf(S)]. By Definition 3.2.5 we have


fp(S) = C(Pf(S)), and Lemma 2.2.32 gives fp(S)] = Pf(S)], which is sufficient to complete

the proof.

Remark 3.2.38. Given that cp(S)] and fp(S)] are both non-commutative versions of fvp(G),

it is natural to ask if th(S) ⊆ fp(S)] can be replaced by the stronger inclusion th(S) ⊆ cp(S)]

in Theorem 3.2.37. The answer is negative, as will be shown by Lemma 4.6.21.

To complete this chapter we obtain results for the Lovasz corner th(SG) where G is a

graph, analogous to those found in Section 3.2.2 for ap(SG), cp(SG) and fp(SG).

Proposition 3.2.39. Let G be a graph with vertex set X = [d]. Then

∆(

th(SG)])

= Dd ∩ th(SG)] = th(G)[.

Proof. Simply apply Corollary 2.4.14 to Theorem 3.2.34.

Corollary 3.2.40. Let graph G have associated operator system SG. Then

(i) her(th(G)) ⊆ th(SG) ⊆ (th(G)[)];

(ii) her(th(G)[) ⊆ th(SG)] ⊆ (th(G))].

Proof. These follow from Theorem 1.2.15, Theorem 2.4.19, Proposition 3.2.39 and Theorem

3.2.34.

Lemma 3.2.41. We have th(SG) = (th(G)[)] if and only if G is complete.

Proof. Recall from the proof of Lemma 3.2.18 that ap(SKd) = AId and from (3.25) that

fp(SKd) = BId , giving fp(SKd)] = AId . Lemma 3.2.36 and Theorem 3.2.37 then yield

th(SKd) = AId . By Theorem 3.2.34,

th(Kd) = Dd ∩ th(SKd) = M ∈M+d ∩ Dd : TrM ≤ 1.

It follows from Lemma 2.4.16 that

(th(Kd)[)] = M ∈M+

d : ∆(M) ∈ th(Kd) = M ∈M+d : TrM ≤ 1 = th(SKd).

For the converse, assume that G is not complete with vertices k 6' l in G. We follow the

method of Lemma 3.2.18. LetA = eke∗k+eke

∗l +ele

∗k+ele

∗l ≥ 0. Recall that I−A 6≥ 0 and, since


th(SG) ⊆ BId by Lemma 3.2.36, it follows that A /∈ th(SG). However we recall from Lemma

3.2.18 that A ∈ (vp(G)[)]. Theorem 1.4.5 gives vp(G) ⊆ th(G); thus vp(G)[ ⊇ th(G)[, and

(vp(G)[)] ⊆ (th(G)[)]. Then A ∈ (th(G)[)] and th(SG) 6= (th(G)[)].

Lemma 3.2.42. It holds that th(SG) = her(th(G)) if and only if G is empty.

Proof. In (3.25) we have ap(SKd) = BId , and then Theorem 3.2.37 and Lemma 3.2.36 give

that th(SKd) = BId . It follows from Theorem 3.2.34 that th(Kd) = Dd ∩ th(SKd

) = M ∈

M+d ∩ Dd : M ≤ I, giving her(th(Kd)) = BId = th(SKd

).

Conversely, suppose G is non-empty with i ∼ j in G. Setting v = 1√2(ei + ej) we have

Tr(vv∗) = 1 and vv∗ ∈ th(SG) by Lemma 3.2.36. Choosing a h.o.n.l.((a(l))l∈V (G), c

)for G

with a(i) = a(j) = c and⟨c, a(l)

⟩= 0 when l /∈ i, j gives eie

∗i + eje

∗j ∈ P0(G). Suppose

towards a contradiction that vv∗ ∈ her(th(G)), that is

vv∗ =1

2

(eie∗i + eie

∗j + ejei + eje

∗j

)≤ Q

for some Q ∈ th(G) ⊆ Dd. This requires 〈ei, Qei〉 > 12 and 〈ej , Qej〉 > 1

2 . (To see that

the inequalities are strict, observe that for Q ∈ Dd we have 〈ei, (Q− vv∗)ej〉 = −12 . But

if 〈ei, Qei〉 = 12 , we have e∗i (Q − vv∗)ei = 0, and since Q ≥ vv∗, Corollary 2.1.2 requires

〈ei, (Q− vv∗)ej〉 = 0. A similar argument applies for j.) We then have⟨Q, eie

∗i + eje

∗j

⟩> 1

and so Q 6∈ P0(G)[ = th(G), the required contradiction. We conclude vv∗ /∈ her(th(G)).

Corollary 3.2.43. If G is a graph then

(i) th(SG)] = her(th(G)[) if and only if G is complete, and

(ii) th(SG)] = th(G)] if and only if G is empty.

Proof. These are immediate from anti-blocking Lemmas 3.2.41 and 3.2.42 and using Lemma

2.2.32 and Corollary 2.3.15.

Chapter 4

Parameters for non-commutative

graphs

Continuing to explore the analogy between graphs and non-commutative graphs, we now show

how the convex corners introduced in the previous chapter lead naturally to definitions of a

number of new parameters for non-commutative graphs, including the fractional chromatic

number and the Lovasz number. We introduce another quantum generalisation of the Lovasz

number which we show to be an upper bound on the Shannon capacity of a quantum channel.

In this chapter we will also discuss the concept of non-commutative graph entropy, a quantum

analogue of graph entropy, and define a generalisation of the Witsenhausen rate to the non-

commutative setting. We conclude by illustrating the theory with some concrete examples of

operator systems.

4.1 Parameters for non-commutative graphs from convex cor-

ners

In the classical setting we have seen how many important parameters of a graph G with d

vertices can be defined in terms of Rd-convex corners associated with G; for instance from

Lemmas 1.3.9 and 1.4.2 and Definition 1.4.1 and equations (1.18), (1.19), (1.26) and (1.35)

124

4.1 Parameters for non-commutative graphs from convex corners 125

we have:

α(G) = γ(VP(G)),

ω(G) = γ(VP(G)) = γ(CP(G)),

ωf(G) = χf(G) = γ(VP(G)[),

ωf(G) = χf(G) = γ(FVP(G)),

θ(G) = γ(TH(G)),

where CP(G) is as defined on page 104.

We begin this section by using Md-convex corners to give quantum analogues of the

above results. We include a brief discussion of homomorphisms between operator systems as

introduced in [51], and also of weighted non-commutative graph parameters.

4.1.1 Defining non-commutative graph parameters

Recall that if G ⊆ M+d , then C(G) = her(conv(G)) is the convex corner generated by G. For

convex corner A we use the parameters γ(A) and M(A) as defined in Definition 2.3.17. Many

of the Md-convex corners we wish to consider are generated by families of projections. The

next two lemmas concern this situation.

Lemma 4.1.1. If P ⊆Md is a set of projections, then

γ(C(P)) = maxrankP : P ∈ P.

Proof. Let P0 ∈ P satisfy rankP0 = maxrankP : P ∈ P. Clearly TrP0 = rankP0, and

so rankP0 ≤ γ(C(P)). Now if A,B ∈ M+d and A ≥ B, then TrA ≥ TrB, meaning that

γ(C(P)) = maxTr A : A ∈ conv(P). But TrP = rankP ≤ rankP0 for all P ∈ P, and so

TrA ≤ rankP0 for all A ∈ conv(P). This gives γ(C(P)) ≤ rankP0 by continuity, and the

proof is complete.

Lemma 4.1.2. If P ⊆Md is a set of projections, then

M(C(P)) = inf

k∑i=1

λi : k ∈ N, P1, . . . , Pk ∈ P, λi > 0,k∑i=1

λiPi ≥ I

.

Proof. Denote the right hand side of the claimed identity by R. Since P ⊆ C(P), it is

clear that M(C(P)) ≤ R. For the reverse inequality, note that if ε > 0, there exists k ∈ N


such that we can find λ1, . . . , λk > 0 and A1, . . . , Ak ∈ C(P) satisfying∑k

i=1 λiAi ≥ I and∑ki=1 λi ≤ M(C(P)) + ε. For each i ∈ [k] there exist Bi ∈ conv(P) satisfying Ai ≤ Bi and

a sequence (B(j)i )j∈N with B

(j)i ∈ conv(P) for all j ∈ N satisfying B

(j)i →j→∞ Bi. For all

δ > 0 and for each i ∈ [k] there exists ni ∈ N such that B(ni)i + δI ≥ Bi ≥ Ai, and we have∑k

i=1 λiB(ni)i ≥ I(1− rδ) where r =

∑ki=1 λi. When δ is chosen so that 1− rδ > 0, we have

k∑i=1

λi1− rδ

B(ni)i ≥ I. (4.1)

Since B(ni)i ∈ conv(P), for each i ∈ [k] we can write B

(ni)i =

∑mil=1 µ

(i)l Pi,l with each Pi,l ∈ P

and µ(i)l ∈ R+ satisfying

∑mil=1 µ

(i)l = 1. From (4.1) we can then see that

R ≤k∑i=1

mi∑l=1

1

1− rδλiµ

(i)l =

k∑i=1

1

1− rδλi ≤

1

1− rδ(M(C(P)) + ε

).

By fixing ε > 0, and hence r, and choosing arbitrarily small δ it follows that R ≤M(C(P))+ε,

Then, since ε can be chosen to be arbitrarily small, R ≤M(C(P)), as required.

The definitions of α(S) in Definition 3.1.33 and ap(S) in Definition 3.2.5 mean that

Lemma 4.1.1 has the following immediate corollary.

Corollary 4.1.3. If S ⊆Md is a non-commutative graph, then α(S) = γ(ap(S)).

The next definition continues this theme by introducing a number of new parameters

for non-commutative graphs based on their associated convex corners, in an analogous way

to the classical case. (We note that the notion of the clique number of an operator system

already exists in the literature [21].) We think of these definitions as generalisations of the

classical parameters listed at the beginning of this section; recall that we regard ap(S) (resp.

th(S)) as a quantum version of VP(G) (resp. TH(G)), and cp(S) (resp. cp(S)]) and fp(S)

(resp. fp(S)]) as quantum versions of CP(G) (resp. FVP(G)).

Definition 4.1.4. Let S ⊆Md be a non-commutative graph. We define

(i) ω(S) = γ (cp(S)) – the clique number of S;

(ii) ω(S) = γ (fp(S)) – the full number of S;

(iii) ωf(S) = γ(ap(S)]

)– the fractional clique number of S;

(iv) θ(S) = γ(th(S)) – the Lovasz number of S; and


(v) θk(S) = γ(thk(S)) – the kth Lovasz number of S.

Remark 4.1.5. (i) Since ap(S), cp(S) and fp(S) are generated by families of projections,

it is immediate from Lemma 4.1.1 that α(S), ω(S), ω(S) are non-negative integers. Since

AId ⊆ ap(S), cp(S), it further holds that α(S), ω(S) ≥ 1. As noted in Remark 3.2.8, however,

it is possible that fp(S) = 0, and in this case ω(S) = 0.

(ii) By Lemma 4.1.1, it is clear that ω(S) (resp. ω(S)) is the maximum cardinality of an

S-clique (resp. an S-full set).

Letting Pa(S) denote the set of all S-abelian projections, it is useful to note that Definition

3.1.39 of the chromatic number of operator system S can be stated equivalently as

χ(S) = min

k ∈ N : P1, . . . , Pk ∈ Pa(S),

k∑i=1

Pi = I

, (4.2)

showing that χ(S) can be thought of as a ‘covering number’ for S using S-abelian projections.

Note trivially that if S ⊆ Md is an operator system, then every rank one projection is an

S-abelian projection, and it thus holds that χ(S) ≤ d. Using S-clique or S-full projections

instead of S-abelian projections in (4.2), we now define two more parameters for a non-

commutative graph. (Remark 4.2.1 will show these correspond to parameters introduced in

[21].) We let Pc(S) and Pf(S) denote the sets of S-clique projections and S-full projections

respectively.

Definition 4.1.6. For operator system S, the clique covering number Ω(S) and the full

covering number Ω(S) are given by

Ω(S) = min

k ∈ N : P1, . . . , Pk ∈ Pc(S),

k∑i=1

Pi = I

, (4.3)

Ω(S) = min

k ∈ N : P1, . . . , Pk ∈ Pf(S),

k∑i=1

Pi = I

. (4.4)

Remark 4.1.7. Let S ⊆Md be an operator system.

(i) Since every rank one projection is an S-clique projection, we have 1 ≤ Ω(S) ≤ d. However,

it may be that there is no k ∈ N such that the condition on the right of (4.4) is satisfied, in

which case we set Ω(S) =∞. Thus, in general it holds that 1 ≤ Ω(S) ≤ ∞.

(ii) It is clear from the respective definitions that Ω(S) = 1 ⇐⇒ ω(S) = d and that

Ω(S) = 1 ⇐⇒ ω(S) = d.

Relaxing the conditions in equations (4.2), (4.3) and (4.4) just as in the definition of the

fractional chromatic number of a graph in (1.16), yields ‘fractional’ versions of χ(S),Ω(S)


and Ω(S).

Definition 4.1.8. For non-commutative graph S ⊆Md, we define

(i) χf(S), the fractional chromatic number of S, by

χf(S) = inf

k∑i=1

λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pa(S),

k∑i=1

λiPi ≥ I

; (4.5)

(ii) Ωf(S), the fractional clique covering number of S, by

Ωf(S) = inf

k∑i=1

λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pc(S),

k∑i=1

λiPi ≥ I

; (4.6)

and

(iii) Ωf(S), the fractional full covering number of S, by

Ωf(S) = inf

k∑i=1

λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pf(S),k∑i=1

λiPi ≥ I

. (4.7)

If fp(S) has empty interior relative to M+d , then the condition on the right of (4.7) cannot

be satisfied, and we set Ωf(S) =∞.

(Note that in [6], Ωf(S) was called the complementary fractional clique number and

denoted by κ(S), and that Ωf(S) was called the complementary fractional full number and

denoted by ϕ(S).) It is immediate from the definitions above that if S ⊆ Md is a non-

commutative graph, then

1 ≤ χf(S) ≤ χ(S) ≤ d, 1 ≤ Ωf(S) ≤ Ω(S) ≤ d, and 1 ≤ Ωf(S) ≤ Ω(S). (4.8)

(To see that each parameter is lower bounded by 1, just note that each projection P satisfies

P ≤ I, and so if∑k

i=1 λi < 1 we have∑k

i=1 λiPi < I.) Since Pf(S) ⊆ Pc(S), it is also clear

that

Ω(S) ≤ Ω(S) and Ωf(S) ≤ Ωf(S).

It is easy to see that these ‘fractional’ parameters for an operator system S can be related

to convex corners associated to S.

Theorem 4.1.9. It holds that

χf(S) = γ(ap(S)]), Ωf(S) = γ(cp(S)]), Ωf(S) = γ(fp(S)]).


Proof. By Lemma 4.1.2 it is clear that

χf(S) = M(ap(S)), Ωf(S) = M(cp(S)), Ωf(S) = M(fp(S));

we then apply Proposition 2.3.18.

As each of the parameters in Definition 4.1.8 is of the form γ(A) for some Md-convex

corner A, Corollary 2.4.32 gives each an entropic significance. In particular,

logχf(S) = maxρ∈Rd

Hap(S)(ρ), (4.9)

log Ωf(S) = maxρ∈Rd

Hcp(S)(ρ),

log Ωf(S) = maxρ∈Rd

Hfp(S)(ρ).

Recall from (1.26) the equality of the fractional chromatic number and fractional clique

number of a graph. It is now clear that the corresponding result for operator systems holds.

Corollary 4.1.10. If S ⊆Md is a non-commutative graph, then

ωf(S) = χf(S). (4.10)

Proof. This is immediate from Definition 4.1.4 and Theorem 4.1.9.

We now show how the classical case is embedded in the quantum setting by verifying that

each non-commutative graph parameter of the operator system SG is equal to the correspond-

ing classical graph parameter of the graph G. (The result in the case of independence number

was proved by a combinatorial method in [35]; [21] gives a proof for both independence and

clique numbers.) Note that χf(G) is commonly written as χf(G) and called the fractional

clique covering number of G.

Proposition 4.1.11. If graph G has associated operator system SG, then

α(SG) = α(G),

ω(SG) = ω(SG) = ω(G),

ωf(SG) = χf(SG) = ωf(G) = χf(G),

θ(SG) = θ(G),

Ωf(SG) = Ωf(SG) = ωf(G) = χf(G).


Proof. Apply Lemma 2.4.15 to Theorem 3.2.11, Corollary 3.2.12 and Theorem 3.2.34 together

with Lemmas 1.3.9 and 1.4.2, Definition 2.2.8, Theorem 4.1.9 and equations (1.18), (1.19),

(1.26), (1.35) and (4.10).

In [35] it is proved that χ(SG) = χ(G) for a graph G. The method used there extends to

yield the analogous results in the case of the clique and full covering numbers.

Proposition 4.1.12. If graph G has associated operator system SG, then

χ(SG) = χ(G), and (4.11)

Ω(SG) = Ω(SG) = χ(G). (4.12)

Proof. (We follow [35, Theorem 7.27].) Let V (G) = [d]. If i1, . . . , ik is an independent set

(resp. clique) in graph G, then ei1 , . . . , eik is an SG-independent set (resp. an SG-clique and

SG-full set) and P =∑k

j=1 eije∗ij

is an SG-abelian projection (resp. an SG-clique projection

and an SG-full projection) by Remark 3.2.2 (i) and Definition 3.2.3. It easily follows that

χ(SG) ≤ χ(G) and that

Ω(SG) ≤ Ω(SG) ≤ χ(G). (4.13)

We now work towards the reverse inequalities. Suppose that χ(SG) = k. Then by

(4.2), there exist SG-abelian projections P1, . . . , Pk satisfying∑k

i=1 Pi = I. By (B.1) we have

ran(Pi) ⊥ ran(Pj) for distinct i, j ∈ [k]. There is then an orthonormal basis v1, . . . , vd of

Cd and a partition S1 ∪ . . . ∪ Sk of [d] such that Pi =∑

j∈Si vjv∗j for each i = 1, . . . , k, and

where S1, . . . , Sk are SG-independent sets.

Denoting the canonical basis of Cd by e1, . . . , ed, it is a standard combinatorial result

(see [35, Lemma 7.28] and [21, Lemma 13]) that there exists a permutation σ on [d] such that⟨eσ(i), vi

⟩6= 0 for all i ∈ [d], and so for j, k ∈ [d] it holds that

⟨vjv∗k, eσ(j)e

∗σ(k)

⟩=⟨eσ(k), vk

⟩ ⟨vj , eσ(j)

⟩6= 0. (4.14)

Since Si is an SG-independent set, vpv∗q ∈ S⊥G for distinct p, q ∈ Si, and so by (4.14),

eσ(p)e∗σ(q) /∈ SG. It then holds that σ(p) 6∼ σ(q) in G for distinct p, q ∈ Si, and we conclude

that σ(j) : j ∈ Si is an independent set in G for each i ∈ [k]. Thus we have k independent

sets which partition V (G), giving χ(G) ≤ χ(SG) as required.

We use an analogous argument to show that χ(G) ≤ Ω(SG), which with (4.13) is sufficient


to prove (4.12). Suppose that Ω(SG) = l. Then by (4.3), there exist SG-clique projections

Q1, . . . , Ql satisfying∑l

i=1Qi = I. As above, there is then an orthonormal basis u1, . . . , ud

of Cd and a partition T1 ∪ . . . ∪ Tl of [d] such that Qi =∑

j∈Ti uju∗j for each i = 1, . . . , l, and

where T1, . . . , Tl are SG-cliques. Now there is a permutation τ on [d] such that

⟨uju∗k, eτ(j)e

∗τ(k)

⟩6= 0 for j, k ∈ [d].

Since Ti is an SG-clique, upu∗q ∈ SG for distinct p, q ∈ Ti, giving in that case that eτ(p)e

∗τ(q) /∈

S⊥G . It then holds that τ(p) ∼ τ(q) in G for distinct p, q ∈ Ti, and thus τ(j) : j ∈ Ti is a

clique in G for each i ∈ [l]. We then have l cliques which partition V (G), and χ(G) ≤ Ω(SG)

as required.

4.1.2 Properties of non-commutative graph parameters

Here we consider some properties of non-commutative graph parameters, beginning with the

following monotonicity results.

Lemma 4.1.13. Let S, T ⊆Md be operator systems satisfying S ⊆ T .

(i) If ζ ∈ α, θ,Ωf , Ωf ,Ω, Ω, then ζ(S) ≥ ζ(T ).

(ii) If ζ ∈ ω, ω, ωf , χ, then ζ(S) ≤ ζ(T ).

Proof. The result for α is immediate from Definition 3.1.33. Noting that if S ⊆ T , then

Pa(T ) ⊆ Pa(S), Pc(S) ⊆ Pc(T ) and Pf(S) ⊆ Pf(T ), the results for Ω, Ω,Ωf and Ωf follow

from Definitions 4.1.8 and 4.1.6 and (4.2). The remaining cases are immediate from Lemmas

3.2.9, 3.2.35 and 2.2.10 (ii).

The various non-commutative graph parameters we have discussed satisfy a number of

further important inequalities, which we now state and prove. Note that Theorem 4.1.14 (i)

is the quantum version of the ‘classical sandwich’ result (1.37).

Theorem 4.1.14. If S ⊆ Md is a non-commutative graph, then the following inequalities

apply.

(i) 1 ≤ α(S) ≤ θ(S) ≤ Ωf(S),

(ii) θ(S) ≤ d,

(iii) α(S) ≤ Ωf(S) ≤ Ωf(S),


(iv) 0 ≤ ω(S) ≤ ω(S) ≤ ωf(S) = χf(S) ≤ d.

Proof. (i) Use Lemma 3.2.7 and Theorem 3.2.37 together with Theorem 4.1.9, Corollary 4.1.3

and (2.20).

(ii) This is immediate from Lemma 3.2.36.

(iii) Use Theorem 3.2.10.

(iv) By anti-blocking Theorem 3.2.10 and using Lemmas 2.2.33 and 3.2.7 we obtain fp(S) ⊆

cp(S) ⊆ ap(S)] ⊆ BId , and the assertion follows by Corollary 4.10.

Our goal is to extend the analogy between graphs and non-commutative graphs, and to

that end it is useful to note how some well-known results for graphs can be generalised. For

instance, if a graph G has d vertices, it is clear that

1 ≤ ω(G) ≤ ωf(G) = χf(G) ≤ χ(G) ≤ d.

Using inequalities from Theorem 4.1.14 as well as (4.8), the generalisation

1 ≤ ω(S) ≤ ωf(S) = χf(S) ≤ χ(S) ≤ d

holds for any non-commutative graph S ⊆Md.

Since for a graph G we can partition V (G) into χ(G) independent sets each of cardinality

at most α(G), it is clear that the well-known result α(G)χ(G) ≥ d holds, with ‘complemen-

tary’ version ω(G)χ(G) ≥ d. It is trivial to generalise for non-commutative graphs. (Note

that corresponding results for so-called ‘operator anti-systems’ are considered in [21, Section

3.1]; operator anti-systems are discussed in Section 4.2.)

Lemma 4.1.15. If S ⊆Md is a non-commutative graph, then

α(S)χ(S) ≥ d and ω(S)Ω(S) ≥ d.

Furthermore, if ω(S) ≥ 1, then ω(S)Ω(S) ≥ d.

Proof. Suppose that χ(S) = k and that abelian projections P1, . . . , Pk ∈Md satisfy∑k

i=1 Pi =

Id. We then have that∑k

i=1 rankPi = d. For all P ∈ Pa(S), Lemma 4.1.1 gives that

rankP ≤ α(S), and so kα(S) ≥ d, and the first assertion holds. The other results follow in

the same way.


The following inequality is immediate from Definition 4.1.4, Lemma 3.2.25 and Corollary

3.2.30.

Corollary 4.1.16. For operator system S ⊆Md and k = 1, 2, . . .,

θd2(S) = θ(S) ≤ θk+1(S) ≤ θk(S).

For operator system S ⊆Md, Lemmas 3.2.7 and 3.2.36 show that ap(S), cp(S) and th(S)

are sandwiched between AId and BId ; by Lemma 2.2.10 it is easily seen that so also are

their anti-blockers. Lemmas 2.4.36 and 2.4.37 then yield some simple equivalences in the

extreme cases of α(S), ω(S), ωf(S),Ωf(S), θ(S) ∈ 1, d. The same cannot be said for fp(S)

and fp(S)]. The next lemmas concern ω(S) and Ωf(S).

Lemma 4.1.17. The following equivalences hold for operator system S ⊆Md:

Ωf(S) = 1 ⇐⇒ ω(S) = d ⇐⇒ S = Md.

Proof. Lemma 3.2.7 gives that fp(S) ⊆ BId , and hence AId ⊆ fp(S)]. Thus, if Ωf(S) =

γ(fp(S)]) = 1, then fp(S)] = AId , yielding fp(S) = BId and ω(S) = d. It then holds

that I ∈ fp(S) and I is an S-full projection. In that case there exists an orthonormal

basis v1, . . . , vd of Cd such that viv∗j ∈ S for all i, j ∈ [d], giving that S = span(viv∗j :

i, j ∈ [d] = Md. The proof is completed by noting that if S = Md, then fp(S) = BId and

Ωf(S) = γ(fp(S)]) = γ(AId) = 1.

Proposition 4.1.18. (i) If operator system S ⊆M+d satisfies ω(S) = 0, then Ωf(S) =∞.

(ii) The following equivalences hold for operator system S:

Ωf(S) =∞ ⇐⇒ fp(S)] is unbounded

⇐⇒ fp(S) has empty interior relative to M+d

Proof. (i) The condition ω(S) = 0 holds if and only if fp(S) = 0, or equivalently fp(S)] =

M+d , which yields Ωf(S) = γ(fp(S)]) =∞.

(ii) Since Ωf(S) = γ(fp(S)]), this is immediate from Lemma 2.1.3 and Proposition 2.2.16.

Remark 4.1.19. Operator systems S satisfying ω(S) = 0 are precisely those for which no unit

vector v satisfies vv∗ ∈ S; a simple example which we have already noted is spanId for

d > 1. Note that the converse of Proposition 4.1.18 (i) does not hold. As a counter example,

consider the non-commutative graph K ⊆ Md for d ≥ 3 given by K = span(Id, e1e∗1). It is


not hard to see that the only K-full projection is e1e∗1, and it then follows that fp(K) = M ∈

M+d : M ≤ e1e

∗1 and ω(K) = 1.

By Lemma 2.2.33 we have fp(K)] = M ∈ M+d : Tr(Me1e

∗1) ≤ 1 and so ke2e

∗2 ∈ fp(S)]

for all k ∈ R+, giving that Ωf(K) =∞.

4.1.3 Non-commutative graph homomorphisms

It is well-known that many graph parameters can be defined in terms of graph homomor-

phisms. (See [51], for example.) In [51], Stahlke defines the concept of a homomorphism

between non-commutative graphs, and shows how this concept can be used, for instance, to

give an equivalent definition of independence number [51, Theorem 13, Definition 11]. Here

we use the theory of homomorphisms between non-commutative graphs to prove a stability

property of the Lovasz number. We begin with the following definition.

Definition 4.1.20. [51, Definition 7] Let S ⊆Md and T ⊆Mk be non-commutative graphs.

A homomorphism from S into T is a quantum channel Γ : Md → Mk that has a Kraus

representation Γ(S) =∑m

i=1AiSA∗i , such that

A∗iTAj ∈ S for all T ∈ T and i, j ∈ [m]. (4.15)

If there exists a homomorphism from S into T , we write S → T .

Since I ∈ T , if Γ is a homomorphism from S into T , then by (4.15) it holds that A∗iAj ∈ S

for all i, j ∈ [m], and we have SΓ ⊆ S and Γ ∈ C(S).

Proposition 4.1.21. If S ⊆Md and T ⊆Mk are non-commutative graphs satisfying S → T ,

then θ(S) ≤ θ(T ).

Proof. Let Γ : Md → Mk be a homomorphism from S into T with the Kraus representation

Γ(S) =∑m

i=1AiSA∗i . Suppose Ψ ∈ C(T ) has Kraus representation Ψ(T ) =

∑ni=1BiTB

∗i . It

is clear that ΨΓ is a quantum channel with set of Kraus operators BiAj : i ∈ [n], j ∈ [m].

Then

SΨΓ = spanA∗jB∗iBkAl : i, k ∈ [n], j, l ∈ [m] ⊆ S,

using that B∗iBk ∈ T for all i, k ∈ [n], and then applying (4.15). We can conclude that

Ψ Γ ∈ C(S). Letting S ∈ th(S) be such that Tr(S) = θ(S), we have that Γ(S) ∈ M+k

and, since Ψ Γ ∈ C(S), it holds that Ψ(Γ(S)) = (Ψ Γ)(S) ≤ I. Thus Γ(S) ∈ th(T ), and

θ(T ) ≥ Tr(Γ(S)) = TrS = θ(S).


Remark 4.1.22. For S, T ⊆ Md with S ⊆ T , the channel I : Md → Md with single Kraus

operator Id is clearly a homomorphism from T into S, and Proposition 4.1.21 gives that

θ(T ) ≤ θ(S). Thus in the case of the Lovasz number, Proposition 4.1.21 is a generalisation

of Lemma 4.1.13. (For obvious reasons, the channel I is known as the identity channel, and

will be discussed in Section 4.6.)

Lemma 4.1.23. Let S ⊆ Md be an operator system and m ∈ N. Then S → Mm(S) and

Mm(S)→ S.

Proof. For r ∈ [m], let Vr ∈ Mm,1(Md) ∼= Mmd,d be the operator given by Vr = (Idδrj)mj=1.

(That is, Vr =(

0 . . . 0 Id 0 . . . 0)t

, with the Id entry in the rth position.) It is easy

to verify thatm∑r=1

V ∗r Vr = mId.

Then Λ : Md →Mmd given by

Λ(S) =1

m

m∑r=1

VrSV∗r for S ∈Md

is a quantum channel with Kraus operators 1√mV1, . . . ,

1√mVm. (In fact, Λ is given by Λ(S) =

1mIm⊗S for S ∈Md.) Letting M ∈Mm(S) be given by M = (Sij)i,j∈[m] with Sij ∈ S for all

i, j ∈ [m], we have1

mV ∗r MVs =

1

mSrs ∈ S,

and so Λ is a homomorphism from S into Mm(S), and S →Mm(S).

Let Γ : Mmd →Md be given by

Γ(M) =

m∑r=1

V ∗r MVr for all M ∈Mm(Md).

Using that∑m

r=1 VrV∗r = Imd, it is clear that Γ is a quantum channel with Kraus operators

V ∗1 , . . . , V∗m. (In fact, for M = (Sij)i,j∈[m] ∈Mm(S) we have Γ(M) =

∑i∈[m] Sii = mΛ∗(M).)

Furthermore,

VrSV∗s = (Sδirδjs)i,j∈[m] ∈Mm(S) for all S ∈ S and r, s ∈ [m].

Then Γ is a homomorphism from Mm(S) into S, and we have Mm(S)→ S, as required.

It is then clear that the Lovasz number has the following stability condition.


Corollary 4.1.24. If S ⊆ Md is an operator system, then θ(S) = θ(Mm(S)) for every

m ∈ N.

Proof. This is immediate from Proposition 4.1.21 and Lemma 4.1.23.

We now show that independence number is also a homomorphism monotone. As in

Remark 4.1.22, we note that this generalises Lemma 4.1.13 in the case of independence

number. The proof we give is adapted from [51, Theorem 13]; see [6, p.27] for an alternative

argument.

Proposition 4.1.25. If S ⊆Md and T ⊆Mk are non-commutative graphs satisfying S → T ,

then α(S) ≤ α(T ).

Proof. Let Γ be a homomorphism from S into T with the Kraus representation Γ(S) =∑mi=1AiSA

∗i . Let u1, . . . , un ⊂ Cd be an S-independent set. For each i ∈ [n] choose

k(i) ∈ [m] such that Ak(i)ui 6= 0. (Since∑m

j=1A∗jAj = Id, we note that this is possible for

all i ∈ [n].) For each i ∈ [n] we let vi = (‖Ak(i)ui‖)−1Ak(i)ui ∈ Ck such that ‖vi‖ = 1 for all

i ∈ [n].

Now note for T ∈ T and i 6= j that

‖Ak(i)ui‖‖Ak(j)uj‖⟨viv∗j , T

⟩=⟨Ak(i)uiu

∗jA∗k(j), T

⟩=⟨uiu∗j , A

∗k(i)TAk(j)

⟩= 0, (4.16)

where we use that uiu∗j ∈ S⊥ since u1, . . . , un is S-independent, and that A∗k(i)TAk(j) ∈ S by

(4.15). Then (4.16) shows that viv∗j ∈ T ⊥ for i 6= j and, using that Ik ∈ T , we have 〈vi, vj〉 = 0

when i 6= j. It thus holds that v1, . . . , vn is a T -independent set and α(T ) ≥ α(S), as

required.

Corollary 4.1.26. Let S ⊆Md be an operator system and m ∈ N. Then

α(Mm(S)) = α(S).

Proof. This is immediate from Lemma 4.1.23 and Proposition 4.1.25.

4.1.4 Weighted parameters

We conclude this section with some brief comments on ‘weighted’ versions of graph and non-

commutative graph parameters. Given a graph G on d vertices with ‘weighting’ w ∈ Rd+ on

4.2 Operator anti-systems 137

its vertices, [22, (4.7), (4.1)] define the quantities α(G,w) and θ(G,w), known respectively as

the weighted independence number and weighted Lovasz number of (G,w), by the expressions

α(G,w) = max〈v, w〉 : v ∈ VP(G),

θ(G,w) = max〈v, w〉 : v ∈ TH(G). (4.17)

(See also [17].) It is clear these parameters generalise the independence number and Lovasz

number in the sense that

α(G) = α(G,1), θ(G) = θ(G,1).

For an operator system S ⊆ Md and ρ ∈ M+d , it would seem natural to define weighted

versions of α(S) and θ(S) by

α(S, ρ) = max〈A, ρ〉 : A ∈ ap(S), θ(S, ρ) = max〈A, ρ〉 : A ∈ th(S). (4.18)

Immediately we have

α(S) = α(S, I), θ(S) = θ(S, I).

(This approach can be applied to any non-commutative graph parameter β satisfying β(S) =

γ(A(S)) where A(S) is an Md-convex corner associated with S; simply define the ‘weighted

version’ by β(S, ρ) = max〈A, ρ〉 : A ∈ A(S), and it holds that β(S) = β(S, I).) We do not

develop this concept any further here, but leave an exploration of these ideas to future work.

We also note that in [22] and [18] the reader will find many expressions for the weighted Lovasz

number θ(G,w) equivalent to that given in (4.17). Further work could also examine their

non-commutative generalisations; we note that these may not all necessarily be equivalent to

(4.18), just as we will see in Section 4.4 that θ(G) has a number of distinct non-commutative

generalisations.

4.2 Operator anti-systems

Rather than working with operator systems, some authors, notably in [51] and [21], have

considered their orthogonal complements. A subspace T ⊆ Md will be called an operator

anti-system if there exists an operator system S ⊆ Md such that T = S⊥. (Such subspaces

are called trace-free non-commutative graphs in [51].) It was pointed out in [21, Proposition

8] that a subspace T ⊆Md is an operator anti-system precisely when it is traceless and self-


adjoint, that is, for all T ∈ T , it holds that TrT = 0 and T ∗ ∈ T . With a graph G on vertex

set V (G) = [d], [51, Equation (7)] and [21, Definition 6] associate the operator anti-system

TG = spaneie∗j : i ∼ j in G,

where e1, . . . , ed is the canonical basis of Cd. As noted by [51, p.2], trying to embed the

notion of graph complements in our quantum setting is complicated by the fact that for a

(loopless) graph G, the statements i ∼ j in G and i j in G are not equivalent, unless

it is further stated that i and j are distinct. One way to proceed is to use both operator

systems and operator anti-systems as explained in [21]. Indeed [21, Remark 7] argues that

the orthogonal complement is analogous to the graph complement because of the obvious

result ([21, Proposition 9])

TG = (SG)⊥, (4.19)

where SG is the operator system associated with G, as in Definition 3.1.31.

We briefly review some definitions made in [21, Section 3.1]. For operator anti-system

T ⊆ Md, an orthonormal set v1, . . . , vk ⊆ Cd is called T -independent when viv∗j ∈ T ⊥ for

all i 6= j. By Definition 3.2.1, it is immediate that v1, . . . , vk is T -independent for operator

anti-system T if and only if v1, . . . , vk is a T ⊥-clique, and thus the maximum cardinality

of a T -independent set is equal to ω(T ⊥). When v1, . . . , vk is a T -independent set, we

will call∑k

j=1 vjv∗j a T -independent projection. Then [21, Definition 12] defines χ(T ), the

chromatic number of operator anti-system T , by

χ(T ) = min

k ∈ N : P1, . . . , Pk ∈ Pi(T ),

k∑i=1

Pi = I

,

where Pi(T ) denotes the set of T -independent projections. As every rank one projection is

T -independent for every operator anti-system T , it holds that χ(T ) ≤ d.

Similarly, an orthonormal set v1, . . . , vk is called T -strongly independent when viv∗j ∈

T ⊥ for all i, j. By Definition 3.2.1, v1, . . . , vk is T -strongly independent for operator

anti-system T if and only if v1, . . . , vk is T ⊥-full, and thus the maximum cardinality of a

T -strongly independent set is equal to ω(T ⊥). When v1, . . . , vk is a T -strongly independent

set, we call the projection∑k

j=1 vjv∗j a T -strongly independent projection. Then χs(T ), the

strong chromatic number of operator anti-system T , is given in [21, Definition 20] by

χs(T ) = min

k ∈ N : P1, . . . , Pk ∈ Ps(T ),

k∑i=1

Pi = I

,


where Ps(T ) denotes the set of T -strongly independent projections. If there exists no set of

T -strongly independent projections whose sum is I, then we set χs(T ) =∞.

Remark 4.2.1. By Definition 4.1.6 and the observations above, it is immediate for operator

anti-system T that

χ(T ) = Ω(T ⊥) and χs(T ) = Ω(T ⊥), (4.20)

and hence Ωf(S) can be regarded as the fractional version of χs(S⊥), and Ωf(S) as the

fractional version of χ(S⊥). It is also immediate that from (4.19), (4.20) and Proposition

4.1.12 that

χ(TG) = χ(S⊥G ) = Ω(SG) = χ(G),

χs(TG) = χs(S⊥G ) = Ω(SG) = χ(G), (4.21)

and so χ(TG) and χs(TG) reduce to their expected values when T = TG for a graph G. (This

was also shown in [21, Corollary 28 and Theorem 14].)

In Lemma 3.1.35 we noted that the tensor product of operator systems is the natural

quantum generalisation of the strong product of graphs. In [51] the concept of the co-normal

product of graphs as given in Definition 1.3.1 is generalised for operator anti-systems as

follows.

Definition 4.2.2. [51, Definition 22] The co-normal product of operator anti-systems T1 ⊆

Md1 and T2 ⊆Md2 is defined by

T1 ∗ T2 = T1 ⊗Md2 +Md1 ⊗ T2.

In the definition above, let Ti = S⊥i for operator systems Si, i = 1, 2. From (B.4) it can

then be seen that T1 ∗T2 = (S1⊗S2)⊥, and since S1⊗S2 is an operator system, it follows that

T1 ∗ T2 is an operator anti-system. That Definition 4.2.2 is a generalisation of the co-normal

graph product is justified by the following lemma from [51]. We include the proof because

the same method will be employed later.

Lemma 4.2.3. [51, Definition 22] If G and H are graphs, then TG ∗ TH = TG∗H .

Proof. Set V (G) = [n] and V (H) = [m] and let the canonical bases of Cn and Cm be


e1, . . . , en and f1, . . . , fm respectively. Then

TG ∗ TH = spaneie∗j : i ∼ j in G ⊗ spanfkf∗l : k, l ∈ [m]

+ spaneie∗j : i, j ∈ [n] ⊗ spanfkf∗l : k ∼ l in H

= spaneie∗j ⊗ fkf∗l : i ∼ j in G or k ∼ l in H

= span(ei ⊗ fk)(ej ⊗ fl)∗ : (i, k) ∼ (j, l) in G ∗H = TG∗H ,

as required.

For completeness we list two other graph products which, though not used in the sequel,

also embed naturally into the quantum setting; the first is well known, but the second appears

to be new. The proofs of Lemmas 4.2.5 and 4.2.7 are trivial adaptations of those of Lemmas

4.2.3 and 3.1.35.

Definition 4.2.4. The tensor product G ⊗H of graphs G and H is the graph with vertex

set V (G)× V (H) in which (i, j) ∼ (k, l) if and only if i ∼ k in G and j ∼ l in H.

Note that the tensor product is sometimes, for example in [21], called the categorical

product. Comparing to Definition 3.1.4, it is clear that G ⊗ H is a spanning subgraph of

GH.

Lemma 4.2.5. [21, Proposition 45] If graphs G and H have associated operator anti-systems

TG and TH , then TG ⊗ TH = TG⊗H .

(Corollary 4.2.11 will show that if Ti, i = 1, 2 are operator anti-systems, then T1 ⊗ T2 is

an operator anti-system.)

Definition 4.2.6. We define the co-tensor product G~H of graphs G and H by V (G~H) =

V (G)× V (H) where (i, j) ' (k, l) in G~H if and only if i ' k in G or j ' l in H. We also

define the co-tensor product of operator systems S1 ⊆Md1 and S2 ⊆Md2 by

S1 ~ S2 = S1 ⊗Md2 +Md1 ⊗ S2.

Note that Remark 3.1.25 shows that S1 ~ S2 is an operator system.

Lemma 4.2.7. If G and H are graphs, then SG ~ SH = SG~H .

4.3 Non-commutative graph entropy 141

Bearing in mind that the strong product is also known as the normal product, the dual-

ity under complements shown by the following lemma explains the choice of nomenclature.

(Lemma 4.2.8 (i) is, of course, well known.)

Lemma 4.2.8. If G and H are graphs, then (i) G ∗H = GH, and (ii) G~H = G⊗H.

Proof. These are immediate consequences of the definitions.

The following result is the quantum analogue of Lemma 4.2.8.

Lemma 4.2.9. Let Si ⊆Mdi be operator systems and Ti ⊆Mdi be operator anti-systems for

i = 1, 2. Then it holds that:

(T1 ∗ T2)⊥ = T ⊥1 ⊗ T ⊥2 , and (S1 ~ S2)⊥ = S⊥1 ⊗ S⊥2 .

Proof. By (B.3) and (B.4),

(T1 ∗ T2)⊥ =(T1 ⊗Md2 +Md1 ⊗ T2)⊥ = (T1 ⊗Md2)⊥ ∩ (Md1 ⊗ T2)⊥

=(T ⊥1 ⊗Md2) ∩ (Md1 ⊗ T ⊥2 ) = T ⊥1 ⊗ T ⊥2 .

The second assertion follows similarly.

Corollary 4.2.10. Let G and H be graphs. Then

(TG∗H)⊥ = SGH , (SG~H)⊥ = TG⊗H .

Proof. We have (TG∗H)⊥ = (TG ∗ TH)⊥ = T ⊥G ⊗ T ⊥H = SG ⊗ SH = SGH , using (4.19) and

Lemmas 4.2.3, 4.2.9 and 3.1.35. The other result is proved in the same way.

Corollary 4.2.11. If Ti are operator anti-systems for i = 1, 2, then T1 ⊗ T2 is an operator

anti-system.

Proof. Let Ti = S⊥i for operator systems Si. By Lemma 4.2.9 we have T1 ⊗ T2 = (S1 ~ S2)⊥

where we note that S1 ~ S2 is an operator system.

4.3 Non-commutative graph entropy

Recall in the classical case how we defined the entropy HA(p) of probability distribution

p ∈ Pd over a general Rd-convex corner A. Given a graph G on d vertices, we considered


the special case A = VP(G) to obtain the important concept of graph entropy given by

H(G, p) = HVP(G)(p). In the quantum setting Definition 2.4.26 defined the entropy of a state

ρ ∈ Rd over a general Md-convex corner A. Since for a non-commutative graph S, we have

regarded ap(S) as a quantum version of VP(G), it is natural to make the following definition.

Definition 4.3.1. The non-commutative graph entropy H(S, ρ) of operator system S ⊆Md

with respect to a state ρ ∈ Rd is the quantity

H(S, ρ) = Hap(S)(ρ) = min−Tr(ρ logA) : A ∈ ap(S).

The aim of this section is to summarise some basic properties of non-commutative graph

entropy; in many cases they follow immediately from the theory developed in Chapter 2 and

are obvious analogues of the properties of graph entropy. The next result verifies that the

classical concept of graph entropy embeds naturally into this more general non-commutative

setting.

Proposition 4.3.2. Let G be a graph with V (G) = [d] and p ∈ Pd be a probability distribution

on V (G). Then, setting ρ =∑d

i=1 pieie∗i gives H(G, p) = H(SG, ρ).

Proof. We apply Lemma 2.4.35 to Theorem 3.2.11.

Next we note that non-commutative graph entropy is non-negative and bounded above

by the von Neumann entropy.

Proposition 4.3.3. For an operator system S ⊆ Md and ρ ∈ Rd we have 0 ≤ H(S, ρ) ≤

H(ρ).

Proof. This follows from (2.38) on page 71 and Lemma 3.2.7.

Non-commutative graph entropy is easily seen to satisfy the following monotonicity con-

dition.

Lemma 4.3.4. If S1 ⊆ S2, then H(S1, ρ) ≤ H(S2, ρ).

Proof. If S1 ⊆ S2, then ap(S1) ⊇ ap(S2) by Lemma 3.2.9, and the result follows from

Definition 4.3.1.


Let G be a graph. We note that H(G, p) = minv∈VP(G)

∑ni=1−pi log vi = 0 if and only

if there exists v ∈ VP(G) such that pi > 0 ⇒ vi = 1. This is equivalent to the condition

that i ∈ V (G) : pi > 0 is an independent set. Also note for a graph G on d vertices that

H(G, p) = 0 for all probability distributions p ∈ Pd if and only if G = Kd, the empty graph

on d vertices. (The ‘if’ statement follows from (1.22) on page 19; for ‘only if’, note that when

G is non-empty and p > 0 we have H(G, p) > 0, because in that case 1 /∈ VP(G).) We now

address the equivalent problems in the non-commutative setting.

Proposition 4.3.5. Suppose S ⊆Md is an operator system. Then H(S, ρ) = 0 if and only if

there exists an orthonormal basis v1, . . . , vd ⊆ Cd such that P =∑

i∈T viv∗i ∈ ap(S), where

T = i ∈ [d] : 〈ρvi, vi〉 > 0.

Proof. We have H(S, ρ) = 0 if and only if there exists A ∈ ap(S) such that −Tr(ρ logA) = 0.

For A ∈ ap(S), we set A =∑d

i=1 aiviv∗i for some orthonormal basis v1, . . . , vd and ai ∈ R+

to give −Tr(ρ logA) =∑d

i=1 log(1/ai) 〈ρvi, vi〉, as in (2.27) on page 60. Thus −Tr(ρ logA) =

0 for A ∈ ap(S) if and only if

ai = 1 whenever 〈ρvi, vi〉 > 0. (4.22)

The ‘if’ part of the lemma is now immediate. For ‘only if’ observe that if A ∈ ap(S) satisfies

(4.22), then the projection P =∑

i∈T viv∗i ≤ A and so P ∈ ap(S).

If operator system S ⊆Md satisfies S ⊆ DV for some orthonormal basis V = v1, . . . , vd

of Cd, then we say S is diagonal in basis V .

Corollary 4.3.6. When S ⊆ Md is an operator system, we have that H(S, ρ) = 0 for all

ρ ∈ Rd if and only if S is diagonal in some orthonormal basis.

Proof. By choosing ρ > 0 we have 〈ρv, v〉 > 0 for all non-zero vectors v ∈ Cd. Then if

H(S, ρ) = 0, Proposition 4.3.5 gives I ∈ ap(S) and I is an abelian projection. Hence for

some orthonormal basis V = v1, . . . , vd of Cd it holds that viv∗j ∈ S⊥ for all i 6= j. That

is, for all M ∈ S we have⟨viv∗j ,M

⟩= 〈vi,Mvj〉 = 0 when i 6= j, and we can conclude that

S is diagonal in basis V . Conversely, if S is diagonal in orthonormal basis V = v1, . . . , vd

and M ∈ S, then⟨viv∗j ,M

⟩= 〈vi,Mvj〉 = 0 for all i 6= j, and v1, . . . , vd is S-independent.

Then∑d

i=1 viv∗i = I ∈ ap(S), and Proposition 4.3.5 gives H(S, ρ) = 0 for all ρ ∈ Rd.

The following lemma can be compared to the classical result in (1.21) on page 19 that


H(Kd, p) = H(p) for all probability distributions p ∈ Pd, where Kd is the complete graph on

d vertices.

Lemma 4.3.7. For all ρ ∈ Rd we have H(Md, ρ) = H(ρ).

Proof. As M⊥d = 0, the only Md-independent sets are the singletons vi and ∅. (Note ∅

by convention is independent, as for graphs. We take the associated abelian projection as

P∅ = 0.) The set of Md-abelian projections then consists precisely of the rank-1 projections,

which each have unit trace, and 0. It follows that ap(Md) = AId , and the result follows by

(2.36).

Remark 4.3.8. If G is a graph, then H(G, p) = H(p) for all probability distributions p if and

only if G is complete: (1.21) gives the ‘if’ condition; for the reverse implication simply observe

for non-complete G with distinct and non-adjacent vertices i and j that i, j is independent

in G and, letting p0 be the probability distribution

p0(i) = p0(j) = 1/2, p0(k) = 0, k /∈ i, j

we have

H(G, p0) = 0 < H(p0) = log 2.

In the non-commutative case, Proposition 4.3.12 will give a number of statements which are

equivalent to the condition

H(S, ρ) = H(ρ) for all ρ ∈ Rd, (4.23)

and it is interesting note that we can find operator systems S (Md which satisfy (4.23).

The following proposition, the quantum analogue of Corollary 1.3.11, is immediate from

(4.9).

Theorem 4.3.9. For operator system S ⊆Md it holds that

maxρ∈Rd

H(S, ρ) = logχf(S).

The following propositions, summarising our results for two particular ‘extreme’ cases,

can be compared to Lemmas 2.4.36 and 2.4.37, which examined the same problem for entropy

over a general Md-convex corner.


Proposition 4.3.10. The following are equivalent for a non-commutative graph S ⊆Md:

(i) S is commutative;

(ii) S is diagonal in some orthonormal basis;

(iii) H(S, ρ) = 0 for all states ρ ∈ Rd;

(iv) χf(S) = 1;

(v) I ∈ ap(S) = BId;

(vi) α(S) = d;

(vii) χ(S) = 1.

Proof. (i) ⇒ (ii). First we show that if S is commutative, then every M ∈ S is normal.

To see this, let M ∈ S, and note then that M∗ ∈ S, since S is self-adjoint. It follows that

M commutes with M∗ and hence is normal. Then, by the properties of normal matrices as

summarised in Appendix B, the elements of S are simultaneously unitarily diagonalisable.

(ii) ⇒ (i). If S is diagonal in some orthonormal basis, it is obviously commutative.

(ii) ⇐⇒ (iii). This is simply Corollary 4.3.6.

(iii) ⇐⇒ (iv) ⇐⇒ (v)⇐⇒ (vi). Apply Lemma 2.4.36, recalling that H(S, ρ) = Hap(S)(ρ)

and using that α(S) = γ(ap(S)) by Corollary 4.1.3, and χf(S) = γ(ap(S)]) by Theorem 4.1.9.

(Note that (iii) ⇐⇒ (iv) also follows from Theorem 4.3.9 and Proposition 4.3.3.)

(v) ⇐⇒ (vii). This is clear from (4.2).

Remark 4.3.11. An example of an operator system which has the properties listed in Propo-

sition 4.3.10 is S = span(Id, A1, . . . , Ak) where A1, . . . , Ak ∈Mhd commute pairwise.

Proposition 4.3.12. The following are equivalent for non-commutative graph S ⊆Md:

(i) H(S, ρ) = H(ρ) for all states ρ ∈ Rd;

(ii) χf(S) = d;

(iii) χ(S) = d;

(iv) ap(S) = AId;

(v) α(S) = 1.

4.4 Another quantum generalisation of θ(G) 146

Proof. (ii) ⇒ (iii). This is immediate from (4.8).

(iii) ⇒ (iv). All rank 1 projections are trivially S-abelian. We claim that if χ(S) = d, then

no S-abelian projection P satisfies rankP ≥ 2. Indeed, if there exists an S-abelian projection

P with rank(P ) ≥ 2, then I can certainly be expressed as the sum of P and at most (d− 2)

rank 1 projections, giving χ(S) ≤ d− 1. It is then clear that ap(S) = AId .

(i) ⇐⇒ (ii) ⇐⇒ (iv) ⇐⇒ (v). We apply Lemma 2.4.37.

Remark 4.3.13. Examples of operator systems satisfying Proposition 4.3.12 include Sd, which

will be defined in Definition 4.6.15, and Md.

4.4 Another quantum generalisation of θ(G)

Motivated by the expression θ(G) = γ(TH(G)) for the Lovasz number of a graph G, Def-

inition 4.1.4 defined the Lovasz number of a non-commutative graph S to be given by

θ(S) = γ(th(S)). Furthermore, Proposition 4.1.11 shows that the non-commutative defi-

nition generalises the classical in the sense that θ(SG) = θ(G). In [28], Lovasz gives several

equivalent expressions for θ(G); interestingly we find that their natural non-commutative

versions are not in general equal. One expression for θ(G) in [28] leads to the result

θ(G) = max‖I + T‖ : T ∈ S⊥G , I + T ≥ 0, (4.24)

given in [35, Theorem 6.10]. Motivated by (4.24), for a non-commutative graph S ⊆Md, [13,

Section 4] defines the quantity ϑ(S) by setting

ϑ(S) = max‖I + T‖ : T ∈Md, I + T ≥ 0, T ∈ S⊥.

Also in [13, Definition 5], ϑ(S), a ‘norm-completion’ of ϑ(S), is defined by letting

ϑ(S) = supn∈N

ϑ(Mn(S)).

It was shown in [13] that if G is a graph, then ϑ(SG) = ϑ(SG) = θ(G), and hence ϑ(S) and

ϑ(S) are both, like θ(S), valid generalisations of the classical parameter θ(G). However, θ(S),

ϑ(S) and ϑ(S) are not in general equal; this is discussed further in Remark 4.5.5.

In this section, motivated by an expression for θ(G) given in [28], we define θ(S), yet

another non-commutative version of θ(G) in the sense that θ(SG) = θ(G) for a graph G.


Later we will show that θ(S) is an upper bound on c(S), the Shannon capacity of S. We will

also see that θ(S) is not in general equal to either ϑ(S) or ϑ(S).

We denote the smallest eigenvalue of A ∈ M+d by λmin(A). For invertible A ∈ M+

d it is

clear that ‖A−1‖ = λmin(A)−1.

Definition 4.4.1. For non-commutative graph S we define θ(S), the second Lovasz number

of S, by

θ(S) = inf∥∥Φ∗(σ)−1

∥∥ : σ ≥ 0,Trσ ≤ 1,Φ ∈ C(S),Φ∗(σ) invertible.

Remark 4.4.2. (i) For S ⊆ Md, we recall that C(S) contains channels Φ : Md → Mk for

many different k ∈ N. In the above expression it is of course assumed that when we consider

Φ ∈ Ck(S), we choose σ ∈M+k . The same applies in the proof of Theorem 4.4.3(i).

(ii) It is clearly sufficient just to consider σ ≥ 0 satisfying Trσ = 1.

(iii) We briefly justify the above definition. First, using (3.32) note that

θ(S) = infλmin(A)−1 : A ∈ P(S)

. (4.25)

Now for a graph G, [28, p. 2 and Theorem 4] gives

θ(G) = min

1

min|⟨c, a(i)

⟩|2 : i ∈ V (G)

:(

(a(i))i∈V (G), c)

is h.o.n.l. of G

. (4.26)

In the notation of Section 3.2.3 and using the first observation in the proof of Lemma 3.2.32,

this becomes

θ(G) = minλmin(A)−1 : A ∈ P0(G)

. (4.27)

Given (3.29) on page 108 and Lemma 3.2.24, we think of P(S) as the non-commutative

version of P0(G). The motivation for Definition 4.4.1 is then clear from a comparison of

(4.25) and (4.27).

The next theorem gives useful, and strikingly similar, characterisations of θ(S) and θ(S).

(Note that in the expressions below C(S) is not compact, and so it does not follow from

Theorem A.0.8 that the supremum and the infimum can be interchanged. See also Remark

4.5.5.)

Theorem 4.4.3. If S ⊆Md is an operator system, then

(i) θ(S)−1 = sup

inf‖Φ(ρ)‖ : ρ ∈ Rd

: Φ ∈ C(S)

;


(ii) θ(S)−1 = inf

sup‖Φ(ρ)‖ : Φ ∈ C(S)

: ρ ∈ Rd

.

Proof. (i) We have

θ(S)−1 = infλmin(Φ∗(σ))−1 : σ a state with λmin(Φ∗(σ)) > 0,Φ ∈ C(S)

−1

= sup λmin(Φ∗(σ)) : σ a state with λmin(Φ∗(σ)) > 0,Φ ∈ C(S)

= sup λmin(Φ∗(σ)) : σ a state,Φ ∈ C(S)

= sup

sup

inf〈Φ∗(σ)ξ, ξ〉 : ξ ∈ Cd, ‖ξ‖ = 1 : σ a state

: Φ ∈ C(S)

= sup sup inf〈Φ∗(σ), ρ〉 : ρ ∈ Rd : σ a state : Φ ∈ C(S)

= sup sup inf〈σ,Φ(ρ)〉 : ρ ∈ Rd : σ a state : Φ ∈ C(S)

= sup inf sup〈σ,Φ(ρ)〉 : σ a state : ρ ∈ Rd : Φ ∈ C(S)

= sup inf ‖Φ(ρ)‖ : ρ ∈ Rd : Φ ∈ C(S) .

In the penultimate step the supremum and infimum were interchanged using Theorem A.0.8.

It is easy to see the conditions of this ‘minimax’ theorem are satisfied. Indeed, for a given

Φ : Md →Mk we see that the infimum and supremum are over the compact and convex sets

Rd and Rk respectively, and that the function (ρ, σ)→ 〈σ,Φ(ρ)〉 is continuous and linear in

both ρ and σ.

(ii) Since each non-zero T ∈M+d can be written as T = (TrT )ρ with ρ = (TrT )−1T ∈ Rd,

we have

θ(S)−1 = maxTrT : T ∈ th(S)−1

= sup supλ ∈ R : λρ ∈ th(S) : ρ ∈ Rd−1

= sup sup λ ∈ R : ‖Φ(λρ)‖ ≤ 1 for all Φ ∈ C(S) : ρ ∈ Rd−1

= sup

supλ ∈ R : λ ≤ ‖Φ(ρ)‖−1 for all Φ ∈ C(S)

: ρ ∈ Rd

−1

= sup

inf‖Φ(ρ)‖−1 : Φ ∈ C(S)

: ρ ∈ Rd

−1

= inf sup ‖Φ(ρ)‖ : Φ ∈ C(S) : ρ ∈ Rd ,

as required

Theorem 4.4.4. Let S ⊆Md be an operator system. Then

d inf‖Φ(Id)‖−1 : Φ ∈ C(S)

≤ θ(S) ≤ θ(S) ≤ d.


Proof. For the first inequality take ρ = 1dId in Theorem 4.4.3 (ii) to give

θ(S)−1 ≤ sup

∥∥∥∥Φ

(Idd

)∥∥∥∥ : Φ ∈ C(S)

.

The second inequality is immediate from Theorems 4.4.3 and A.0.7. The last inequality

follows from Theorem 4.4.3 (i) by noting that the identity channel belongs to C(S). It then

follows that θ(S)−1 ≥ inf ‖ρ‖ : ρ ∈ Rd = 1/d.

The next proposition verifies that θ(S) can indeed be regarded as a generalisation of the

classical graph parameter θ(G).

Proposition 4.4.5. Let graph G have vertex set X = [d]. Then θ(SG) = θ(G).

Proof. By Proposition 4.1.11 and Theorem 4.4.4,

θ(G) = θ(SG) ≤ θ(SG). (4.28)

Let ((ax)x∈X , c) be a h.o.n.l. of G with ax, c ∈ Rk, and let ΦA ∈ Ck(SG) be the quantum

channel defined in (3.47). Then, as in Lemma 3.2.32,

Φ∗A(cc∗) =∑x∈X| 〈ax, c〉 |2exe∗x ∈ P(SG),

giving

λmin(Φ∗A(cc∗)) = minx∈X| 〈ax, c〉 |2.

Thus by (4.25) and (4.26) we have

θ(SG) ≤ min

1

minx∈X | 〈ax, c〉 |2: ((ax)x∈X , c) is a h.o.n.l. of G

= θ(G).

Together with (4.28), this completes the proof.

Calculation of θ(S) and θ(S) for a given non-commutative graph S will in general be

difficult, but we do have the following two propositions.

Proposition 4.4.6. For operator system S ⊆Md, the following are equivalent:

(i) θ(S) = 1;

(ii) S = Md;

(iii) θ(S) = 1.


Proof. (i)⇒ (ii). Corollary 3.2.30 and the method used to prove Theorem 4.4.3 (ii) give that

θ(S)−1 = θd2(S)−1 = inf

sup‖Φ(ρ)‖ : Φ ∈ Cd2(S)

: ρ ∈ Rd

.

Note that ‖Φ(ρ)‖ ≤ Tr(Φ(ρ)) = 1 for all states ρ ∈ Rd and quantum channels Φ ∈ Cd2(S),

and so sup‖Φ(ρ)‖ : Φ ∈ Cd2(S) ≤ 1 for all ρ ∈ Rd. Thus, when θ(S) = 1, it must hold that

sup‖Φ(ρ)‖ : Φ ∈ Cd2(S) = 1 for all ρ ∈ Rd. Setting ρ = 1dI yields

sup

∥∥∥∥Φ

(1

dI

)∥∥∥∥ : Φ ∈ Cd2(S)

= 1.

There then is a sequence (Φk)k∈N of channels in Cd2(S) such that ‖Φk(1dI)‖ → 1 as k →∞.

By Lemma 3.2.27, Cd2(S) is compact, and thus the sequence (Φk)k∈N has a subsequence

converging to some Φ0 ∈ Cd2(S) satisfying ‖Φ0(1dI)‖ = 1 by continuity of the norm. Since

Tr(Φ0(1

dI))

= Tr(

1dI)

= 1, it follows that Φ0(1dI) = vv∗ for some unit vector v ∈ Cd2

. Let

f1, . . . , fd be an orthonormal basis of Cd. Then Φ0(1dI) = 1

d

∑di=1 Φ0(fif

∗i ) = vv∗, where

Φ0(fif∗i ) ∈ Rd2 for all i = 1, . . . , d. By Proposition 2.4.2, the pure state vv∗ ∈ Rd2 cannot be

written as a convex combination of distinct states, and thus it must hold that Φ0(fif∗i ) = vv∗

for each i = 1, . . . , d; indeed, since every unit vector u ∈ Cd is an element of some orthonormal

basis, we have Φ0(uu∗) = vv∗ for all unit vectors u ∈ Cd. By linearity Φ0 is then the trivial

channel given by

Φ0(ρ) = vv∗ for all ρ ∈ Rd. (4.29)

It is easy to verify that ve∗1, . . . , ve∗d is a set of Kraus operators for Φ0 where e1, . . . , ed

is the canonical basis of Cd. We then have SΦ0 = spaneiv∗ve∗j : i, j ∈ [d] = Md, and since

Φ0 ∈ Cd2(S), we have S = Md.

(ii) ⇒ (iii). For any operator system S ⊆Md, Theorems 4.4.4 and 4.1.14 give that

1 ≤ θ(S) ≤ θ(S). (4.30)

Recalling Theorem 4.4.3, we then have

θ(S)−1 = sup

inf‖Φ(ρ)‖ : ρ ∈ Rd

: Φ ∈ C(S)

≤ 1. (4.31)

If S = Md, then the channel Φ0 defined in (4.29) satisfies Φ0 ∈ C(S), and so

θ(Md)−1 ≥ inf ‖Φ0(ρ)‖ : ρ ∈ Rd = ‖vv∗‖ = 1,

4.5 Capacity bounds, the Witsenhausen rate and other limits 151

and we conclude θ(Md) = 1.

(iii) ⇒ (i). This is immediate from (4.30).

Having considered the case θ(S) = θ(S) = 1, we now examine the case that θ(S) =

θ(S) = d.

Proposition 4.4.7. For an operator system S ⊆ Md, it holds that θ(S) = θ(S) = d if and

only if Φ(Id) ≤ Id2 for all Φ ∈ Cd2(S).

Proof. Theorem 4.4.4 gives θ(S) ≤ θ(S) ≤ d, and so θ(S) = θ(S) = d if and only if θ(S) = d.

By Corollary 4.1.16 the latter condition is equivalent to θd2(S) = d, which by Lemma 3.2.36

and Corollary 3.2.30 happens if and only if Id ∈ thd2(S). The result is then immediate from

(3.34).

Remark 4.4.8. We know of no operator system S satisfying θ(S) > θ(S), but nor do we have

a general proof of equality: if θ(S) = θ(S) for all non-commutative graphs S remains an

important open question.

Corollary 3.2.30 showed for operator system S ⊆ Md that θ(S) can be computed using

the channels in Cd2(S). This raises another open question.

Question 4.4.9. Given d ∈ N, does there exist k ∈ N (depending on d), such that for every

non-commutative graph S ⊆ Md the parameter θ(S) can be computed using the channels in

Ck(S)?

4.5 Capacity bounds, the Witsenhausen rate and other limits

In the classical case (3.3) shows that θ(G) is an upper bound on c(G), the Shannon capacity

of G. This section begins by discussing upper bounds on c(S), the Shannon capacity of

operator system S, and establishes that θ(S) and Ωf(S) are two such upper bounds. By

demonstrating that some of the parameters associated with operator systems and operator

anti-systems satisfy either sub-multiplicativity or super-multiplicativity conditions, we show

the existence of a number of limits, one of which will be seen to be a quantum generalisation

of the Witsenhausen rate.

We first recall that any upper bound on independence number which is sub-multiplicative

over tensor products is an upper bound on c(S).


Lemma 4.5.1. Suppose that the real parameter β(S) satisfies β(S) ≥ α(S) for every operator

system S, and suppose further that the sub-multiplicativity condition β(S ⊗ T ) ≤ β(S)β(T )

holds for all operator systems S, T . Then c(S) ≤ β(S).

Proof. We have

c(S) = limn→∞

n√α(S⊗n) ≤ lim inf

n→∞n√β(S⊗n) ≤ β(S),

as claimed.

In [13] it is shown that ϑ(S) satisfies the conditions of Lemma 4.5.1, and is therefore an

upper bound on Shannon capacity. We now show that the same can be said for θ(S).

Proposition 4.5.2. Let S1 ⊆Md1 and S2 ⊆Md2 be operator systems. Then

θ(S1 ⊗ S2) ≤ θ(S1)θ(S2).

Proof. For arbitrarily small ε > 0 and for each i ∈ 1, 2, by Definition 4.4.1 we can choose

quantum channels Φi : Mdi → Mki with SΦi ⊆ Si and σi ∈ M+ki

satisfying Tr(σi) ≤ 1 such

that each Φ∗i (σi) is invertible and satisfies

∥∥Φ∗i (σi)−1∥∥ ≤ θ(Si) + ε.

By Lemma 3.1.36, Φ1 ⊗ Φ2 : Md1 ⊗ Md2 → Mk1 ⊗ Mk2 is a quantum channel satisfying

SΦ1⊗Φ2 ⊆ S1 ⊗ S2, and so Φ1 ⊗ Φ2 ∈ C(S1 ⊗ S2). Properties of the tensor product as in

Definition B.0.3 give that (Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2) = Φ∗1(σ1)⊗ Φ∗2(σ2) is invertible with

((Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2))−1 = Φ∗1(σ1)−1 ⊗ Φ∗2(σ2)−1,

and that σ1 ⊗ σ2 ∈Mk1 ⊗Mk2 satisfies Tr(σ1 ⊗ σ2) = Trσ1 Trσ2 ≤ 1. From Definition 4.4.1

we then have

θ(S1 ⊗ S2) ≤∥∥((Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2))−1

∥∥ =∥∥Φ∗1(σ1)−1 ⊗ Φ∗2(σ2)−1

∥∥=

∥∥Φ∗1(σ1)−1∥∥∥∥Φ∗2(σ2)−1

∥∥ ≤ (θ(S1) + ε)(θ(S2) + ε).

The conclusion follows by letting ε→ 0.

The next corollary is immediate and establishes that θ(S) is an upper bound on c(S).

Remark 4.5.4 shows further that it can be an arbitrarily more efficient upper bound than


ϑ(S).

Corollary 4.5.3. Let S be a non-commutative graph. Then

α(S) ≤ c(S) ≤ θ(S).

Proof. Theorems 4.1.14 and 4.4.4 give that θ(S) ≥ θ(S) ≥ α(S), whence the upper bound on

c(S) is immediate from Lemma 4.5.1 and Proposition 4.5.2. The lower bound on c(S) follows

from the super-multiplicativity of independence number as given in Lemma 3.1.37.

Whether θ is multiplicative over tensor products remains an open question. Recalling

from Section 4.4 the definitions and properties of ϑ and ϑ as introduced in [13], we make the

following remarks. We note that [13, Corollary 10] gives that ϑ is multiplicative, whereas [13,

Lemma 4] gives that ϑ is multiplicative.

Remark 4.5.4. Both ϑ(S) and θ(S) are upper bounds on c(S). These two parameters, how-

ever, are not in general equal; indeed, Proposition 4.6.1 gives that θ(CId) = d, whereas [13,

equations (6) and (7)] give ϑ(CId) = d2, showing that the ratio θ(S)

ϑ(S)can be arbitrarily small.

It is not known, however, if θ(S) ≤ ϑ(S) in general.

Remark 4.5.5. It is useful to compare further the various quantum generalisations of θ(G)

which we have discussed, to see if any may in fact be identical. Given that θ(S) ≤ θ(S),

it is clear from Remark 4.5.4 that θ(S) and ϑ(S) are not in general equal. Since θ is sub-

multiplicative in the sense of Proposition 4.5.2, and it is shown in [13, p. 9-10] that ϑ is not, it

is clear that θ(S) 6= ϑ(S) in general. We recall from Corollary 4.1.24 that θ(Mm(S)) = θ(S)

for m ∈ N, but that ϑ lacks this stability (see [13, p. 10]), and thus θ(S) 6= ϑ(S) in general. On

the other hand, as noted in Remark 4.4.8, we cannot rule out the possibility that θ(S) = θ(S)

for all operator systems S.

Proposition 4.5.6. The non-commutative graph parameters χ and χf are sub-multiplicative

over tensor products; that is, for non-commutative graphs S ⊆Mc and T ⊆Md, we have

χ(S ⊗ T ) ≤ χ(S)χ(T )

and

χf(S ⊗ T ) ≤ χf(S)χf(T ).

Proof. In the proof of Lemma 3.1.37 it has been shown that if ui : i = 1, . . .m is S-

independent and vi : i = 1, . . . , n is T -independent, then ui ⊗ vj : (i, j) ∈ [m] × [n] is


S ⊗ T -independent. In this case let P be the S-abelian projection P =∑m

i=1 uiu∗i and let Q

be the T -abelian projection Q =∑n

i=1 viv∗i . We note that

P ⊗Q =∑

i∈[m],j∈[n]

(uiu∗i )⊗ (vjv

∗j ) =

∑i∈[m],j∈[n]

(ui ⊗ vj)(ui ⊗ vj)∗, (4.32)

and so P ⊗Q is an S ⊗ T -abelian projection.

We begin with the case of the fractional chromatic number. By the definition of χf(S) in

(4.5), for arbitrary δ > 0 we can find positive weightings λ1, . . . , λk and S-abelian projections

P1, . . . Pk such thatk∑i=1

λi ≤ χf(S) + δ withk∑i=1

λiPi ≥ Ic.

Similarly, for arbitrary ε > 0 we can find positive weightings µ1, . . . , µl and T -abelian pro-

jections Q1, . . . , Ql such that

l∑i=1

µi ≤ χf(T ) + ε with

l∑i=1

λiQi ≥ Id.

Now observe that

∑i∈[m],j∈[n]

λiµjPi ⊗Qj =

(m∑i=1

λiPi

)⊗

n∑j=1

µjQj

≥ Ic ⊗ Id = Icd,

where each Pi ⊗Qj is an S ⊗ T -abelian projection, giving that

χf(S ⊗ T ) ≤∑

i∈[m],j∈[n]

λiµj ≤ (χf(S) + δ)(χf(T ) + ε),

and the claim follows on letting δ and ε tend to zero.

The corresponding result for chromatic numbers is found in the same way by requiring

λi, µj ∈ 0, 1 and setting ε = δ = 0.

We now return to the quantum setting of the ‘side information’ problem considered in

Section 3.1.4 by establishing the existence of the limit

limn→∞

1

nlogχ(S⊗nΦ )

for a quantum channel Φ.

Proposition 4.5.7. For an operator system S, the limit limn→∞1n logχ(S⊗n) exists and is


equal to infn∈N1n logχ(S⊗n).

Proof. Let an = logχ(S⊗n) and consider the sequence (an)n∈N. By Proposition 4.5.6

an+m = logχ(S⊗(n+m)) ≤ logχ(S⊗n) + logχ(S⊗m) = an + am;

the sequence (an)n∈N is therefore sub-additive, and the claim follows by Lemma 3.1.6.

Definition 4.5.8. For an operator system S, we define the Witsenhausen rate of S by letting

R(S) = limn→∞

1

nlogχ(S⊗n).

Recalling the definition of the Witsenhausen rate R(G) of a graph G from Definition 3.1.9,

we have the following result.

Proposition 4.5.9. For a graph G it holds that R(SG) = R(G).

Proof. This is immediate from (4.11) on page 130 and Lemma 3.1.35.

A similar method to that used in Proposition 4.5.6 establishes that both the full covering

number and its fractional version are also sub-multiplicative over tensor products. We note

that this answers [6, Question 7.5].

Proposition 4.5.10. Let S and T be non-commutative graphs. It holds that

Ω(S ⊗ T ) ≤ Ω(S)Ω(T )

and

Ωf(S ⊗ T ) ≤ Ωf(S)Ωf(T ).

Proof. Suppose the sets u1, . . . , um and v1, . . . , vn are S-full and T -full respectively. As

in (3.12), it is clear that the set ui ⊗ vj : i ∈ [m], j ∈ [n] is orthonormal. Furthermore, by

Definition 3.2.1, for all i, k ∈ [m] and j, l ∈ [n] we have

(ui ⊗ vj)(uk ⊗ vl)∗ = uiu∗k ⊗ vjv∗l ∈ S ⊗ T , (4.33)

and so the set ui ⊗ vj : i ∈ [m], j ∈ [n] is S ⊗ T -full. Let P be the S-full projection given

by P =∑m

i=1 uiu∗i , and Q be the T -full projection given by Q =

∑mi=1 viv

∗i . As in (4.32) we


have P ⊗ Q =∑

i∈[m],j∈[n](ui ⊗ vj)(ui ⊗ vj)∗, and P ⊗ Q is an S ⊗ T -full projection. The

sub-multiplicativity of the parameters Ω and Ωf now follows from Definition 4.1.8 and the

argument of Proposition 4.5.6.

Remark 4.5.11. We note that corresponding results for the clique covering number and its

fractional version do not follow by the same argument. It is clear from Definition 3.2.1

that when working with cliques, (4.33) holds if i 6= k and j 6= l, but not necessarily for all

(i, j) 6= (k, l).

Corollary 4.5.12. Let S be a non-commutative graph. Then c(S) ≤ Ωf(S).

Proof. Proposition 4.5.10 and Theorem 4.1.14 (i) show that the fractional full covering num-

ber satisfies the conditions of Lemma 4.5.1 to be an upper bound on Shannon capacity.

Since (4.20) on page 139 gives that χs(T ) = Ω(T ⊥) for an operator anti-system T ,

Proposition 4.5.10 allows us to establish the sub-multiplicativity of the strong chromatic

number, this time over co-normal products. (For the reason noted in Remark 4.5.11, an

equivalent result for the chromatic number of an operator anti-system, which satisfies χ(T ) =

Ω(T ⊥), does not follow.)

Corollary 4.5.13. For operator anti-systems T1 and T2

χs(T1 ∗ T2) ≤ χs(T1)χs(T2).

Proof. Using (4.20), Lemma 4.2.9 and Proposition 4.5.10, we have

χs(T1 ∗ T2) = Ω((T1 ∗ T2)⊥) = Ω(T ⊥1 ⊗ T ⊥2 ) ≤ Ω(T ⊥1 )Ω(T ⊥2 ) = χs(T1)χs(T2),

as required.

In the next proposition we let T n denote the nth co-normal power of the operator anti-

system T , just as we have used Gn to denote the nth co-normal power of the graph G.

Proposition 4.5.14. For an operator anti-system T , the limit limn→∞1n logχs(T n) exists

and is equal to infn∈N1n logχs(T n).

Similarly, for an operator system S, the limit limn→∞1n log Ω(S⊗n) exists and is given by the

expression infn∈N1n log Ω(S⊗n).

If S = T ⊥, then these two limits are equal.


Proof. We use the method of Proposition 4.5.7 and the sub-multiplicativity properties of χs

and Ω. The final assertion is immediate by (4.20) and Lemma 4.2.9.

To consider the significance of these limits, first recall from [46, Corollary 3.4.3] that for

a graph G it holds that

limn→∞

n√χ(Gn) = χf(G), (4.34)

with ‘complementary’ version

limn→∞

n

√χ(Gn) = χf(G),

a result first due to Posner and McEliece in [30]. When T = TG for graph G, Lemma 4.2.3

and (4.21) give that χs(T nG ) = χs(TGn) = χ(Gn). Then by (4.34), Proposition 4.1.11 and

(4.19) on page 138,

limn→∞

1

nlogχs(T nG ) = lim

n→∞

1

nlogχ(Gn) = logχf(G) = log Ωf(SG) = log Ωf(T ⊥G ).

Remembering that Ωf(T ⊥) is the fractional version of χs(T ), comparison with (4.34) leads

us to ask the following open question.

Question 4.5.15. Does it hold that

limn→∞

n√χs(T n) = Ωf(T ⊥)

for any operator anti-system T ? (It is, of course, equivalent to ask if the ‘complementary’

result

limn→∞

n

√Ω(S⊗n) = Ωf(S)

holds for any operator system S.)

We now wish to consider full and clique numbers of tensor products of operator systems.

We recall that both these parameters are quantum generalisations of the clique number of a

graph, which has the following well-known multiplicativity property. (The proof is immediate

from Corollary 1.3.4 and Lemma 4.2.8.)

Lemma 4.5.16. ([15, Chapter 7, Exercise 13].) If F and G are graphs, then ω(F G) =

ω(F )ω(G).

We now consider the non-commutative case.


Proposition 4.5.17. For operator systems S, T it holds that

ω(S ⊗ T ) ≥ ω(S)ω(T ).

Proof. Let ω(S) = p and ω(T ) = q. By Lemma 4.1.1 we can choose S-full set u1, . . . , up

and T -full set v1, . . . , vq. As shown in Proposition 4.5.10, the set ui ⊗ vj : i ∈ [p], j ∈ [q]

is S ⊗ T -full, and ω(S ⊗ T ) ≥ pq.

It is not known if ω is multiplicative.

Corollary 4.5.18. The limit limn→∞n√ω(S⊗n) exists for any operator system S. If G is a

graph, then n

√ω(S⊗nG ) = ω(G) for all n ∈ N.

Proof. To prove the existence of the limit, apply the method used to prove Corollary 3.1.7.

In the case that G is a graph, Lemma 3.1.35 and Proposition 4.1.11 give that ω(S⊗nG ) =

ω(SGn) = ω(Gn). Lemma 4.5.16 completes the proof.

We leave the value of the limit in Corollary 4.5.18 for an operator system not of the form

SG for a graph G as an open question.

Remark 4.5.19. For the reason discussed in Remark 4.5.11, the method used to prove Propo-

sition 4.5.17 cannot be used to prove super-multiplicativity of clique numbers.

The next proposition examines the clique number of tensor products, where the behaviour

is more subtle.

Proposition 4.5.20. Let S and T be operator systems.

(i) It holds that ω(S ⊗ T ) ≥ minω(S), ω(T ).

(ii) If ω(T ) ≥ 1, then ω(S ⊗ T ) ≥ ω(S).

Proof. (i) Without loss of generality, let ω(S) = p ≤ q = ω(T ), and choose S-clique

u1, . . . , up and T -clique v1, . . . , vq. Now consider the set B = ui ⊗ vi : i ∈ [p]. The

orthonormality of B is immediate. For i 6= j note that

(ui ⊗ vi)(uj ⊗ vj)∗ = (uiu∗j )⊗ (viv

∗j ) ∈ S ⊗ T ,

where we used that uiu∗j ∈ S and viv

∗j ∈ T for i 6= j. This suffices to show that B is an

S ⊗ T -clique, and the result follows.

4.6 Some examples 159

(ii) Since ω(T ) ≥ 1, we can find a T -full projection vv∗. Again let u1, . . . , up be an

S-clique where ω(S) = p. The set C = ui ⊗ v : i ∈ [p] is trivially orthonormal, and

furthermore (ui ⊗ v)(uj ⊗ v)∗ = uiu∗j ⊗ vv∗ ∈ S ⊗ T for i 6= j. It follows that C is an

S ⊗ T -clique, and ω(S ⊗ T ) ≥ ω(S).

Bearing in mind the multiplicativity of clique number over strong graph products given

by Lemma 4.5.16, one might intuitively expect a stronger result than Proposition 4.5.20 (i)

to hold. Questions about the clique numbers of tensor products of operator systems remain,

but there seems to be no trivial way to strengthen Proposition 4.5.20 (i) in general. Indeed,

Example 4.6.20 will establish that there exist operator systems S and T satisfying ω(S⊗T ) <

ω(T ). Further work could usefully consider the behaviour of the sequence (ω(S⊗n))n∈N and

examine both lim infn∈Nn√ω(S⊗n) and lim supn∈N

n√ω(S⊗n).

4.6 Some examples

If G is a graph with associated operator system SG, then we have already seen that the various

non-commutative graph parameters of SG assume values equal to the corresponding graph

parameters of G. (Indeed, we regard this as a necessary property of a valid generalisation

to the non-commutative setting.) For instance, Proposition 4.1.11 gave that α(SG) = α(G),

ω(SG) = ω(G), θ(SG) = θ(G) and ωf(SG) = ωf(G). We note the cases where G is the

complete graph Kd, giving SG = Md, and where G is the empty graph Kd, giving SG = Dd.

In this section we want to give examples of operator systems not associated to any graph G,

and consider their corresponding quantum channels.

The identity channel I : Md → Md given by I(ρ) = ρ for all ρ ∈ Md may be carried out

with single Kraus operator Id and thus has associated operator system CId := αId : α ∈

C. Below we evaluate some non-commutative graph parameters corresponding to this case.

Although this operator system is somewhat trivial, parameters associated with fp(CI) are

seen to display an interesting change of behaviour as d increases from 1 to 2.

Proposition 4.6.1. For d ∈ N we have

α(CId) = θ(CId) = θ(CId) = Ωf(CId) = Ω(CId) = d,

ω(CId) = ωf(CId) = χf(CId) = χ(CId) = 1,


Ωf(CId) = Ω(CId) =

1 if d = 1

∞ if d ≥ 2,

ω(CId) =

1 if d = 1

0 if d ≥ 2,

H(CId, ρ) = 0 for all ρ ∈ Rd.

Proof. For orthonormal u, v ∈ Cd we have 〈u, v〉 = 〈uv∗, Id〉 = 0 and so uv∗ ∈ CI⊥d . It

follows that a projection in Md is an CId-clique projection if and only if it has rank 1,

giving ω(CId) = 1 and Ω(CId) = d. The operator system CId is clearly commutative and so

Proposition 4.3.10 gives that α(CId) = d and χ(CId) = χf(CId) = 1. From inequalities in

Theorems 4.1.14 and 4.4.4 and (4.8) we then obtain θ(CId) = θ(CId) = Ωf(CId) = Ω(CId) =

d. Now note that if d = 1 we have e1e∗1 ∈ fp(CI1) and fp(CI1) = [0, 1] = fp(CId)]. This gives

ω(CI1) = Ω(CI1) = Ωf(CId) = 1. However, if d ≥ 2, no unit vector v satisfies vv∗ ∈ CId.

Then fp(CId) = 0 and fp(CId)] = M+d , giving ω(CId) = 0, Ω(CId) =∞ and Ωf(CId) =∞

by (4.8) and Theorem 4.1.9. That H(CId, ρ) = 0 for all ρ ∈ Rd follows from Proposition

4.3.10.

Corollary 4.6.2. For the operator system CId with d ∈ N, it holds that c(CId) = d and

R(CId) = 0.

Proof. To obtain the Shannon capacity, first note that α ((CId)⊗n) = α(CIdn) = dn, giving

c(CId) = d. Similarly, we have χ ((CId)⊗n) = χ(CIdn) = 1, and so R(CId) = 0.

In the proof of Proposition 4.4.6 we had a channel Φ0 : Md →Md2 which was shown to be

‘trivial’ in the sense that it mapped all input states onto the same pure state vv∗. For some

fixed state σ ∈ Rk we now define the trivial channel Φσ : Md →Mk by Φσ(M) = σTrM for

all M ∈Md. It is easy to see that Φσ is trace preserving and satisfies Φσ(ρ) = σ for all states

ρ ∈ Rd.

Proposition 4.6.3. The trivial channel Φσ defined above is a c.p.t.p. map with associated

operator system SΦσ = Md.

Proof. Let σ =∑k

i=1 µiηiη∗i for µi ∈ R+ and orthonormal basis η1, . . . , ηk ⊆ Ck. Observe


for M ∈Md that

∑i∈[k],j∈[d]

(√µi ηie

∗j )M(

√µi ejη

∗i ) =

∑i∈[k]

(√µi ηi)

∑j∈[d]

(e∗jMej)(√µi η

∗i )

=σTrM = Φσ(M),

where e1, . . . , ed is the canonical basis of Cd. We also have that

∑i∈[k],j∈[d]

(√µi ηie

∗j )∗√µi ηie∗j =

∑i∈[k]

µi∑j∈[d]

eje∗j = Id,

using that∑

i∈[k] µi = Trσ = 1. It follows by Proposition 3.1.18 that Φσ is a c.p.t.p. map

with Kraus operators √µi ηie∗j : i ∈ [k], j ∈ [d], and the operator system SΦσ associated

with Φσ is given by

SΦσ = span√µiµm ejη∗i ηme∗l : i,m ∈ [k], j, l ∈ [d] = spaneje∗l : j, l ∈ [d] = Md,

using that µi > 0 for some i ∈ [k].

Remark 4.6.4. Having shown that the trivial channel Φσ defined above has associated operator

systemMd = SKd , it is immediate that all its parameters are given by the corresponding graph

parameters of the complete graph Kd.

We now introduce an operator system which, though less trivial, is still relatively simple

to analyse.

Definition 4.6.5. We define operator system Td ⊆ Md by Td = CId + CJd where Id is the

d× d identity matrix and Jd is the d× d all-ones matrix.

That is, M ∈ Td if and only if M ∈Md is of the form

M =

λ µ . . . µ

µ λ . . . µ...

.... . .

...

µ µ . . . λ

= (λ− µ)Id + µJd

with λ, µ ∈ C. (Of course, T1 = CI1.) Note that Td = spanId, Jd and that Id and Jd

commute, so Td is an example of the operator systems considered in Remark 4.3.11.)


Proposition 4.6.6. For all d ∈ N,

α(Td) = θ(Td) = θ(Td) = Ωf(Td) = Ω(Td) = d;

ω(Td) = ωf(Td) = χf(Td) = χ(Td) = 1;

H(Td, ρ) = 0 for all ρ ∈ Rd.

Proof. As Td is commutative, we can apply Proposition 4.3.10. Theorems 4.4.4 and 4.1.14

along with (4.8) on page 128 give the remaining results.

Corollary 4.6.7. For d ∈ N, we have c(Td) = d and R(Td) = 0.

Proof. Proposition 4.6.6 and Corollary 4.5.3 give c(Td) = d. Proposition 4.6.6 and Proposition

4.5.6 give χ(T ⊗nd ) = 1 for all n ∈ N, whence we have R(Td) = 0.

Parameters for Td related to the full projection convex corner exhibit an interesting de-

pendence on d, as given by the next two lemmas. (Recall T1 = CI1, so by Proposition 4.6.1,

Ω(T1) = Ωf(T1) = ω(T1) = 1.)

Proposition 4.6.8. We have Ω(T2) = Ωf(T2) = 2 and ω(T2) = 1.

Proof. Suppose unit vector v = (vi)2i=1 ∈ C2 satisfies vv∗ ∈ T2, such that v is T2-full. The

ij-element of vv∗ is vivj . Since Tr(vv∗) = 1, for vv∗ ∈ T2, we have |v1|2 = |v2|2 = 1/2. It is

also required that v1v2 = v2v1. Multiplying both sides by v1v2 gives v21|v2|2 = v2

2|v1|2 and so

v21 = v2

2 and v1 = ±v2. Setting v1 = eiθ/√

2 = ±v2 with θ ∈ [0, 2π) gives

vv∗ =1

2

1 ±1

±1 1

∈ T2, (4.35)

and we conclude that the T2-full singleton sets are those of the form eiθ√2

1

±1

: θ ∈ [0, 2π)

.

By Theorem 4.1.14 and Proposition 4.6.6, ω(T2) ≤ ω(T2) = 1, and we conclude ω(T2) = 1. It

follows from Lemma 4.1.15 that Ω(T2) ≥ 2. Now let u = 1√2

1

1

and v = 1√2

1

−1

, and

note that u and v are T2-full sets and that uu∗ + vv∗ = I. This gives that Ω(T2) = 2.


From (4.35) it can be seen that the only T2-full projections are

P1 =1

2

1 1

1 1

and P2 =1

2

1 −1

−1 1

,

and fp(T2) is the convex corner generated by P1, P2. Thus by Lemma 2.2.32, for a matrix

M =

a b

b d

∈ M+2 , we have M ∈ fp(T2)] if and only if the following two conditions are

satisfied:

Tr(MP1) =1

2(a+ b+ b+ d) ≤ 1

and

Tr(MP2) =1

2(a− b− b+ d) ≤ 1.

It follows for any M ∈ fp(T2)] that TrM = a+ d ≤ 2. We also see that I ∈ fp(T2)], and can

thus conclude by Theorem 4.1.9 that

Ωf(T2) = maxTrM : M ∈ fp(T2)] = 2,

as required.

Example 4.6.9. To complete our discussion of T2, we give an example of a quantum channel

Φ satisfying SΦ = T2. It is easy to verify that the operators 1√2

1 1

0 0

and 1√2

0 0

1 −1

are Kraus operators for such a channel.

We now analyse Td for d ≥ 3.

Proposition 4.6.10. For d ≥ 3 it holds that Ω(Td) = Ωf(Td) =∞ and ω(Td) = 1.

Proof. Let unit vector v = (vi)di=1 ∈ Cd satisfy vv∗ ∈ Td. Since Tr(vv∗) = 1, this requires

that |vi|2 = 1/d for all i ∈ [d]. The following argument shows further that vi = vj for all

i, j ∈ [d]. Letting i, j, k ∈ [d] be pairwise distinct, we require vivk = vjvk, and vi = vj as

claimed. Then v = eiθ√d1 for some θ ∈ [0, 2π), giving vv∗ = 1

dJd ∈ Td.

Thus for d ≥ 3, the Td-full singleton sets are precisely those of the form

eiθ√d1 : θ ∈ [0, 2π)

.

As for d = 2, for d ≥ 3 we have ω(Td) ≤ ω(Td) = 1 and we conclude that ω(Td) = 1. It also

holds that the only Td-full projection is 1dJd. Then for M ∈M+

d we have M ∈ fp(Td)] if and


only if Tr(MJd) ≤ d. Let unit vector w = (wi)di=1 ∈ Cd satisfy

∑di=1wi = 0. Observe that

〈w,1〉 = 0. Now for k ∈ R+ we form M = kww∗ ∈M+d , giving that

Tr(MJd) = kTr(ww∗11∗) = k| 〈w,1〉 |2 = 0.

Hence we have M ∈ fp(Td)] for all k ∈ R+ and since TrM = k it holds that Ωf(Td) = ∞.

Finally note by (4.8) that Ω(Td) =∞.

It is instructive to give an example of a channel Φ satisfying SΦ = T3.

Example 4.6.11. The operators

A1 =1√2

1 0 0

0 1 0

0 0 1

0 0 0

0 0 0

0 0 0

, A2 =

1√8

0 1 1

1 0 1

1 1 0

1 −1 0

0 1 −1

1 0 −1

satisfy A∗1A1 = A∗2A2 = 1

2I3 and A∗2A1 = A∗1A2 = 14(J3− I3). Thus the channel Φ : M3 →M6

given by Φ(ρ) = A1ρA∗1 +A2ρA

∗2 for ρ ∈M3 is a quantum channel with

SΦ = spanA∗iAj : i, j ∈ 1, 2 = T3.

We briefly consider tensor products of operator systems of the form Td, after the following

two straightforward lemmas.

Proposition 4.6.12. Consider operator systems Ri ⊆ Mdi where α(Ri) = di for i =

1, . . . ,m. Then

α(R1 ⊗ . . .⊗Rm) = Ω(R1 ⊗ . . .⊗Rm) = Ωf(R1 ⊗ . . .⊗Rm)

= θ(R1 ⊗ . . .⊗Rm) = d1 . . . dm.

Proof. Note that R1⊗. . .⊗Rm ⊆Md1...dm , and by the super-multiplicativity of independence

number as given in Lemma 3.1.37 we have α(R1 ⊗ . . .⊗Rm) ≥ d1 . . . dm. The results follow

from inequalities in Theorem 4.1.14 and (4.8) on page 128.

Proposition 4.6.13. Consider operator systems Ri ⊆Mdi where χ(Ri) = 1 for i = 1, . . . ,m.


Then

ω(R1 ⊗ . . .⊗Rm) = χf(R1 ⊗ . . .⊗Rm) = χ(R1 ⊗ . . .⊗Rm) = 1.

Proof. By Proposition 4.5.6, χ(R1 ⊗ . . . ⊗ Rm) ≤ 1, and the results follow from Theorem

4.1.14, (4.8) and (4.10) on page 129.

Using the above two lemmas, we see that Proposition 4.6.6 has the following corollary.

Corollary 4.6.14. It holds that

α(Td1 ⊗ . . .⊗ Tdm) = Ωf(Td1 ⊗ . . .⊗ Tdm) = Ω(Td1 ⊗ . . .⊗ Tdm)

= θ(Td1 ⊗ . . .⊗ Tdm) = d1 . . . dm,

and

ω(Td1 ⊗ . . .⊗ Tdm) = χf(Td1 ⊗ . . .⊗ Tdm) = χ(Td1 ⊗ . . .⊗ Tdm) = 1.

Next we discuss an operator system that has been widely considered in the literature, for

example, see [21] and [27].

Definition 4.6.15. The ‘constant diagonal’ operator system Sd is defined by

Sd = spaneie∗j , Id : i 6= j ⊆Md, d ∈ N,

where e1 . . . , ed is the canonical basis of Cd.

For d ≥ 2, Sd is not commutative, and so it does not reduce to the rather trivial case of

Proposition 4.3.10, and nor is it equal to SG for any graph G. We should thus expect it to

exhibit non-trivial, genuinely quantum behaviour. In [27] it was shown that α(S2) = 1 and in

[21, Examples 4, 22] that χ(Sn) = χs(S⊥n ) = n. Here we extend these results by considering

tensor products of operator systems of this type and calculating the values of some of the

parameters introduced earlier. For n1, n2, . . . , nm ∈ N, let

Sn1,...,nm = Sn1 ⊗ Sn2 ⊗ · · · ⊗ Snm .

Lemma 4.6.16. Let u, v ∈ Cn1...nm be orthogonal vectors.

(i) If uv∗ ∈ S⊥n1,...,nm, then u = 0 or v = 0;

(ii) If uu∗, uv∗ ∈ Sn1,...,nm, then u = 0 or v = 0.


Proof. (i) Suppose first that uv∗ ∈ S⊥n1,...,nm . Let m = 1 and note that

S⊥n1=

a1 . . . 0...

. . ....

0 . . . an1

:

n1∑i=1

ai = 0

.

Write u = (ui)n1i=1 and v = (vi)

n1i=1. Then for uv∗ ∈ S⊥n1

we have that uivj = 0 whenever i 6= j

and∑n1

i=1 uivi = 0. Suppose that uk 6= 0 for some k ∈ [n1]. Then vj = 0 for all j 6= k, and it

follows that∑n1

i=1 uivi = ukvk = 0. This gives vk = 0, and hence v = 0, thus establishing the

result in the case that m = 1.

Proceeding by induction, suppose that the statement holds for some m. Note that

Sn1,...,nm+1 =

S S1,2 . . . S1,n1

S2,1 S . . . S2,n1

......

. . ....

Sn1,1 Sn1,2 . . . S

: S, Si,j ∈ Sn2,...,nm+1

. (4.36)

Thus, S⊥n1,...,nm+1consists of all block matrices of the form

D1,1 D1,2 . . . D1,n1

D2,1 D2,2 . . . D2,n1

......

. . ....

Dn1,1 Dn1,2 . . . Dn1,n1

,

where

Di,j ∈ S⊥n2,...,nm+1, i 6= j, (4.37)

andn1∑i=1

Di,i ∈ S⊥n2,...,nm+1. (4.38)

Write

u =

u(1)

...

u(n1)

, v =

v(1)

...

v(n1)

, u(i), v(i) ∈ Cn2...nm+1 , i ∈ [n1],

and suppose that uv∗ =(u(i)v(j)∗)n1

i,j=1∈ S⊥n1...nm+1

. Assume that u(i) 6= 0 for some i ∈ [n1].

By (4.37), u(i)v(j)∗ ∈ S⊥n2...nm+1for all j 6= i, and so by the induction assumption v(j) = 0

whenever j 6= i. Then∑n1

k=1 u(k)v(k)∗ = u(i)v(i)∗ ∈ S⊥n2...nm+1

by (4.38), and by the induction


assumption, v(i) = 0; thus, v = 0.

(ii) Suppose that uu∗, uv∗ ∈ Sn1...nm with orthogonal u, v ∈ Cn1...nm and ‖u‖ =√k > 0.

Write u = (ui)n1...nmi=1 and v = (vi)

n1...nmi=1 . We make the following claim, which is clearly

sufficient to prove the required result:

v = 0 and |ui|2 =k

n1 . . . nmfor all i ∈ [n1 . . . nm]. (4.39)

First we establish (4.39) when m = 1. If uu∗ = (uiuj)i,j∈[n1] ∈ Sn1 and ‖u‖2 = k, then

|ui|2 = k/n1 for all i ∈ [n1]. If in addition, uv∗ = (uivj)i,j∈[n1] ∈ Sn1 , then uivi = ujvj for all

i, j ∈ [n1]. Since 〈u, v〉 = 0, we have that uivi = 0 for all i ∈ [n1], and hence vi = 0 for all

i ∈ [n1], which yields v = 0.

Proceeding by induction, suppose (4.39) holds under the stated conditions for some m.

For orthogonal u, v ∈ Cn1...nm+1 write

u =

u(1)

...

u(n1)

, v =

v(1)

...

v(n1)

, u(i), v(i) ∈ Cn2...nm+1 , i ∈ [n1].

Suppose that uu∗, uv∗ ∈ Sn1,...,nm+1 with ‖u‖ =√k > 0. Now uu∗ = (u(i)u(j)∗)i,j∈[n1] ∈

Sn1...nm+1 , and so by (4.36) we have u(i)u(i)∗ = u(j)u(j)∗ ∈ Sn2...nm+1 for all i, j ∈ [n1]. Since

‖u(i)‖2 = Tr(u(i)u(i)∗) we then have ‖u(i)‖ = ‖u(j)‖ for all i, j ∈ [n1]. Since ‖u‖2 = 〈u, u〉 =∑n1i=1 ‖u(i)‖2 = k, we have ‖u(i)‖2 = k/n1 for all i ∈ [n1]. Similarly, uv∗ = (u(i)v(j)∗)i,j∈[n1] ∈

Sn1...nm+1 , and so u(i)v(i)∗ = u(j)v(j)∗ ∈ Sn2...nm+1 for all i, j ∈ [n1]. Since⟨u(i), v(i)

⟩=

Tr(u(i)v(i)∗) this gives⟨u(i), v(i)

⟩=⟨u(j), v(j)

⟩for all i, j ∈ [n1]. Then using that 〈u, v〉 =∑n1

i=1

⟨u(i), v(i)

⟩= 0, we have

⟨u(i), v(i)

⟩= 0 for all i ∈ [n1]. Thus we can apply the induction

hypothesis to u(i), v(i) for each i ∈ [n1] to yield |u(i)j |2 = k

n1...nm+1(where we have used that

‖u(i)‖2 = k/n1) and also that v(i) = 0 for all i ∈ [n1] and j ∈ [n2 . . . nm+1]. Then (4.39) holds

for m+ 1.

Proposition 4.6.17. Let n1, . . . , nm ∈ N. Then

(i) ap(Sn1,...,nm) = AIn1...nm;

(ii) α(Sn1,...,nm) = 1;

(iii) c(Sn1,...,nm) = 1;

(iv) χf (Sn1,...,nm) = χ(Sn1,...,nm) = n1 . . . nm;


(v) R(Sn1,...,nm) = log(n1, . . . , nm);

(vi) ω(Sn1,...,nm) = 1;

(vii) Ω(S2) = Ωf(S2) = 2 and Ω(Sn1,...,nm) ≥ Ωf(Sn1,...,nm) ≥ n1 . . . nm;

(viii) ω(Sn) = n and ω(Sn1,...,nm) ≥ minn1, . . . , nm;

(ix) Ωf(Sn) = Ω(Sn) = 1.

Proof. (i) By Lemma 4.6.16 (i), all non-zero Sn1,...,nm-abelian projections have rank 1. Re-

calling the definition of AC in (2.3) on page 44, the claim then follows from Lemma 3.2.7;

(ii) and (iii) follow immediately.

(iv) By (i), ap(Sn1,...,nm)] = BIn1...nmusing Corollary 2.2.34, and hence χf(Sn1,...,nm) =

n1 . . . nm by Theorem 4.1.9. It follows from (4.8) that χ(Sn1,...,nm) = n1 . . . nm.

(v) By (iv), χ ((Sn1,...,nm)⊗n) = nn1 . . . nnm, and applying Definition 4.5.8 yields the result.

(vi) By Lemma 4.6.16 (ii), any non-zero Sn1,...,nm-full projection has rank 1, and so we have

fp(Sn1,...,nm) ⊆ AIn1...nmand ω(Sn1,...,nm) ≤ 1. Observe that if u = 1√

n1,...,nm1 ∈ Cn1...nm ,

then uu∗ = 1n1,...,nm

Jn1...nm ∈ Sn1,...,nm , giving that uu∗ is an Sn1,...,nm-full projection, and

the result is proved.

(vii) From the proof of (vi) we have fp(Sn1,...,nm) ⊆ AIn1...nm, and so BIn1...nm

⊆ fp(Sn1,...,nm)].

We thus have In1...nm ∈ fp(Sn1,...nm)], and using (4.8) we obtain Ω(Sn1,...,nm) ≥ Ωf(Sn1,...,nm) ≥

n1, . . . , nm by Theorem 4.1.9. Since T2 ⊆ S2, Lemma 4.1.13 and Lemma 4.6.8 give Ωf(S2) ≤

Ωf(T2) = 2 and Ω(S2) ≤ Ω(T2) = 2, yielding Ω(S2) = Ωf(S2) = 2.

(viii) It is easy to see that e1, . . . , en, the canonical basis of Cn, is an Sn-clique. By Theorem

4.1.14 we then have ω(Sn) = n. The inequality in the general case follows from Proposition

4.5.20.

(ix) By (viii) we have I ∈ cp(Sn), and applying Definition 4.1.6 gives Ω(Sn) = 1. The value

of Ωf(Sn) follows from (4.8).

We have not yet determined the values of Ω(Sd) or Ωf(Sd) for d ≥ 3. It is perhaps sur-

prising that an operator system which seemingly has such a straightforward structure would

present such challenges, but we have already noted that the behaviour of the full projection

convex corner can be subtle, even in the simplest of cases. The difficulties associated with


the operator system Sd only increase when we consider the parameters θ and θ: the determi-

nation of θ(Sd) and θ(Sd) for d ≥ 2 remains an open problem. We now outline what can be

established in this case, before making a conjecture as to the values of θ(Sd) and θ(Sd). We

begin by considering the quantum channels contained in Cn(Sd).

Lemma 4.6.18. If Φ ∈ Cn(Sd), then Φ has a Kraus representation

Φ(ρ) =m∑i=1

AiρA∗i for ρ ∈Md,

where for each i ∈ [m] we have Ai =√λi

(c

(i)1 . . . c

(i)d

)∈ Mn,d, with λi ∈ R+ and

c(i)1 , . . . , c

(i)d ∈ C

n satisfying the following:

(i) for each j ∈ [d], the setc

(1)j , . . . , c

(m)j

⊂ Cn is orthonormal; and

(ii) for all k, l ∈ [d], it holds that∑m

i=1λi

⟨c

(i)k , c

(i)l

⟩= δkl.

(We note that (i) requires m ≤ n and that setting k = l in (ii) gives∑m

i=1 λi = 1.)

Proof. By Corollary 3.1.21, the quantum channel Φ : Md →Mn has a set of Kraus operators

A1, . . . , Am ⊆Mn,d satisfying

Tr(A∗iAj) = 0 for i 6= j. (4.40)

For i ∈ [m], write Ai =(v

(i)1 . . . v

(i)d

)∈Mn,d with v

(i)1 , . . . , v

(i)d ∈ C

n. This gives

A∗iAj =

⟨v

(j)1 , v

(i)1

⟩. . .

⟨v

(j)d , v

(i)1

⟩...

. . ....⟨

v(j)1 , v

(i)d

⟩. . .

⟨v

(j)d , v

(i)d

⟩ ∈Md. (4.41)

If Φ ∈ Cn(Sd), then SΦ = spanA∗iAj : i, j ∈ [m] ⊆ Sd, and from (4.41) it is then clear

that ⟨v

(j)k , v

(i)k

⟩=⟨v

(j)l , v

(i)l

⟩for all i, j ∈ [m], k, l ∈ [d]. (4.42)

Setting i = j in (4.42) yields that ‖v(i)k ‖ = ‖v(i)

l ‖ for all k, l ∈ [d], and hence for all i ∈ [m]

there exists λi ∈ R+ such that ‖v(i)k ‖

2 = λi for all k ∈ [d]. For all i ∈ [m] and k ∈ [d] we set

v(i)k =

√λi c

(i)k , giving

Ai =√λi

(c

(i)1 . . . c

(i)d

), with ‖c(i)

k ‖ = 1 for all i ∈ [m], k ∈ [d].


Since Ai 6= 0, we note that λi > 0.

By (4.40) and (4.41)d∑

k=1

⟨v

(j)k , v

(i)k

⟩= 0 when i 6= j, (4.43)

and from (4.42) it follows that

⟨v

(j)k , v

(i)k

⟩= 0 for all k ∈ [d] when i 6= j.

It is immediate that ⟨c

(j)k , c

(i)k

⟩= 0 for all k ∈ [d] when i 6= j,

and (i) holds.

Since A1 . . . , Am is a set of Kraus operators, Proposition 3.1.18 gives∑m

i=1A∗iAi = Id.

In (4.41) we set j = i, sum over i ∈ [m] and consider the (lk)-element to obtain

m∑i=1

⟨v

(i)k , v

(i)l

⟩=

m∑i=1

λi

⟨c

(i)k , c

(i)l

⟩= δlk,

and (ii) is proved.

Consider Φ ∈ Cn(Sd) with Kraus operators as given above. As a consequence of Lemma

4.6.18 (i), there exist unitaries U1, . . . , Ud ∈Mn such that c(i)j = Ujei for i ∈ [m] and j ∈ [d],

where e1, . . . , en is the canonical basis of Cn. Now set Λ =∑m

i=1 λieie∗i ∈ D+

n . Using

Lemma 4.6.18 (ii),

〈U∗l Uk,Λ〉 =m∑i=1

λi 〈Ukei, Ulei〉 =m∑i=1

λi

⟨c

(i)k , c

(i)l

⟩= δkl.

Setting k = l gives that Tr Λ = 1 and hence Λ ∈ Rn. Now observe that

Φ(Id) =

m∑i=1

λi

(c

(i)1 . . . c

(i)d

)c

(i)∗1

...

c(i)∗d

=

m∑i=1

λi

d∑j=1

c(i)j c

(i)∗j

=

m∑i=1

λi

d∑j=1

Ujeie∗iU∗j =

d∑j=1

UjΛU∗j . (4.44)

Conjecture 4.6.19. For d ∈ N, we make the conjecture that θ(Sd) = θ(Sd) = d.

Recalling Proposition 4.4.7, it is clear that Conjecture 4.6.19 holds if and only if Φ(Id) ≤


Id2 for all Φ ∈ Cd2(Sd). By (4.44) it is sufficient to prove that if unitaries U1, . . . , Ud ∈ Md2

and diagonal Λ ∈ Rd2 satisfy 〈U∗l Uk,Λ〉 = δkl, then

d∑j=1

UjΛU∗j ≤ Id2 . (4.45)

We lack a general proof that (4.45) holds under these conditions, but it is easy to see that

it holds in the following two special cases:

(i) If λi ≤ 1/d for all i = 1, . . . ,m, then Λ ≤ d−1Im, and the result is immediate.

(ii) If Λ = uu∗ for some unit vector u ∈ Cd2, then letting Uiu = vi, we have

〈vi, vj〉 = Tr(Uiuu∗U∗j ) = Tr(UiΛU

∗j ) =

⟨U∗j Ui,Λ

⟩= δij

and v1, . . . , vd is orthonormal. We then have

d∑i=1

UiΛU∗i =

d∑i=1

Uiuu∗U∗i =

d∑i=1

viv∗i ≤ Id2 .

Having discussed the operator system Sd, we can now give an example of an interesting

phenomenon mentioned at the end of Section 4.5 about the behaviour of clique numbers of

tensor products.

Example 4.6.20. Consider the operator system CI2⊗S2. Recall that ω(S2) = 2 by Proposition

4.6.17 and ω(CI2) = 1 by Proposition 4.6.1. We now claim that ω(CI2 ⊗ S2) = 1 < ω(S2).

Since u is an CI2⊗S2-clique for any unit vector u ∈ C4, it suffices to prove that no (CI2⊗S2)-

clique has cardinality greater than 1. To establish this we show that if uv∗ ∈ CI2 ⊗ S2, then

u = 0 or v = 0. We first observe that

CI2 ⊗ S2 =

λ a 0 0

b λ 0 0

0 0 λ a

0 0 b λ

: λ, a, b ∈ C

.

For u, v ∈ C4, write u = (ui)4i=1 and v = (vi)

4i=1, and suppose that uv∗ = (uivj)

4i,j=1 ∈

CI2 ⊗ S2. This requires

u1v3 = u1v4 = u2v3 = u2v4 = 0, giving u1 = u2 = 0 or v3 = v4 = 0,

4.7 Further questions 172

and

u3v1 = u3v2 = u4v1 = u4v2 = 0, giving u3 = u4 = 0 or v1 = v2 = 0.

Since for uv∗ ∈ CI2 ⊗ S2 we also have

u1v1 = u2v2 = u3v3 = u4v4,

it must then hold that all these terms vanish. Similarly, u1v2 = u3v4 and vanishes because

either v4 = 0 or u1 = 0. Finally, u2v1 = u4v3 and these terms both vanish because u4 = 0

or v1 = 0. We then have uv∗ = 0, and it follows that u = 0 or v = 0, and u, v is not a

CI2 ⊗ S2-clique.

Our study of the non-commutative graph Sd also resolves the following question concern-

ing the quantum sandwich theorem. Recall that cp(S)] ⊆ fp(S)] by Theorem 3.2.10 and that

both cp(S)] and fp(S)] are non-commutative versions of fvp(G). The form of Theorem 1.4.5,

the classical sandwich theorem, and a comparison with Theorem 3.2.37 invite us to ask if

fp(S)] can be replaced by cp(S)] in Theorem 3.2.37. By considering the operator system S2,

the following result shows that this is not possible.

Lemma 4.6.21. It holds that th(S2) 6⊆ cp(S2)].

Proof. It is clear that e1, e2 is an S2-clique, giving that I ∈ cp(S2) and, using Lemma 3.2.7,

we obtain cp(S2) = BI2 . Anti-blocking gives cp(S2)] = AI2 . However, by Proposition 4.4.6 it

holds that θ(S2) > 1, and the result follows.

4.7 Further questions

There is much scope for further work on the links between quantum information theory and

the theory of Md-convex corners. We now gather some of the open questions that have been

raised in this chapter, and we discuss potential directions of further research that may prove

to be promising.

• Section 4.1.4 discussed definitions of weighted parameters for non-commutative graphs;

work could usefully be undertaken on their properties.

• We examined a number of examples of non-commutative graphs in Section 4.6, and

much work could be undertaken in this direction. For self-adjoint M ∈Md, we note that

spanId,M is a non-commutative graph, and this example would be worth considering.


• Noting the importance of the fractional chromatic number, it would be interesting if

the result limn→∞n√χ(Gn) = χf(G) could be generalised to the quantum setting, as

discussed in Question 4.5.15.

• For any non-commutative graph S does it hold that θ(S) = θ(S)? This is probably the

most important open question in Chapter 4.

• Section 4.5 also notes the need for further work on the value of limn→∞n√ω(S⊗n) and

on the behaviour of the sequence (ω(S⊗n))n∈N for a non-commutative graph S.

Chapter 5

The classical source with memory

One of the central concepts of Chapter 1 was the graph entropy of a probabilistic graph, as

introduced by Korner [23] to solve the source coding problem over a partially distinguishable

alphabet. It is important to note that the analysis there was of an i.i.d. source. Theory

described for the classical setting in Chapter 1 was generalised to the quantum setting in

Chapters 2 to 4. We now return to the classical case, but seek to generalise the concept

of graph entropy to the ‘non-i.i.d.’ source, where successively emitted symbols are not in-

dependent, but rather follow some joint distribution. This, of course, will be the case with

any ‘real-life’ communication. Such a source is then said to possess memory. We begin by

summarising background material on the source with memory, in particular the concepts of

entropy and isomorphism. A short section containing some new graph theoretic results is

then followed by an attempt to generalise the theory of the source with memory to the situ-

ation of partial distinguishability. In this setting of partial distinguishability we will discuss

a generalisation of the Kolmogorov–Sinai Theorem and a notion of isomorphism, as well as

considering the Bernoulli and Markov shifts.

5.1 Entropy and the source with memory

A full development of the theory of the source with memory leading to the definition of

Kolmogorov–Sinai entropy and the proof of the Kolmogorov–Sinai Theorem can be found

in both [4] and [20]; in this section we summarise the important concepts and results. As

in Chapter 1, we let X denote a fixed, finite alphabet, and we consider a source emitting a

doubly infinite sequence ω = (. . . , ω−1, ω0, ω1, . . .) of elements of X . We denote by Ω the set

of all such sequences. We write Ω = X Z and take Ω as our sample space. We note that it is

174

5.1 Entropy and the source with memory 175

possible to develop an analogous theory for sequences of the form ω = (ω0, ω1, . . .), infinite

in one direction only [4]. Considering doubly infinite sequences in some ways simplifies the

analysis, and is the approach we will follow.

We begin by recalling some standard measure theory. A σ-algebra E on Ω is a collection

of subsets of Ω such that ∅ ∈ E and such that E is closed under countable unions, countable

intersections and taking complements. If E is a σ-algebra on Ω, then (Ω, E) is a called a

measurable space. A probability measure P on measurable space (Ω, E) is a function E →

[0, 1] satisfying P (Ω) = 1 and having the property of countable additivity, namely that

P (⋃∞i=1Ei) =

∑∞i=1 P (Ei) for all pairwise disjoint sets E1, E2, . . . ∈ E . It follows that P (∅) =

0, but we note that non-empty sets may also have zero measure; if A ∈ E satisfies P (A) = 0,

then we say that A is a null set. If P is a probability measure on (Ω, E), we say that the triple

(Ω, E , P ) is a probability space. For measurable spaces (Ω, E) and (Ω′, E ′), we say function

f : Ω→ Ω′ is measurable if f−1(E) ∈ E for all E ∈ E ′.

If C is a collection of subsets of Ω, then the σ-algebra on Ω generated by C is the intersection

of all σ-algebras on Ω containing C, or equivalently the smallest σ-algebra on Ω to contain C.

A set of the form

ω ∈ Ω : ωi1 = j1, . . . , ωin = jn for some n ∈ N, ik ∈ Z, jk ∈ X

will be known as a cylinder, and, following [4], we let F be the σ-algebra on Ω generated by

the cylinders. Letting P be a probability measure on F , we work in the probability space

(Ω,F , P ).

The bijection T : Ω → Ω defined by (Tω)n = ωn+1 is known as the shift transformation

on Ω. We can imagine that a given ω ∈ Ω is sent one coordinate at a time, with ωi being the

symbol sent at time i. Then Tω represents the same message, with the time origin shifted

forward by one unit. It is natural to insist that this shift of time origin should not affect

probabilities, so we require the probability measure P on F to satisfy P (T−1A) = P (A) for

every cylinder A. When P (A) = P (T−1A) for all A ∈ F , we say that T is measure preserving.

Since T is invertible, this is equivalent to the condition that P (TA) = P (A) for all A ∈ F .

Indeed, we then have P (TnA) = P (A) for all n ∈ Z.

It is shown in [4, Theorem 1.1] that if P (A) = P (T−1A) for every cylinder A, then for

all B ∈ F the sets TB, T−1B ∈ F satisfy P (TB) = P (B) = P (T−1B); in other words, in

this case we have that both T and T−1 are measurable and measure-preserving functions on

Ω. Throughout the sequel we insist that this condition holds. The Kolmogorov Existence


Theorem, as discussed in [4, Example 1.2], shows that a measure P on F preserved by T is

uniquely determined by specifying the measure it gives to each cylinder. We note here that

quadruples of the form (Ω,F , P, T ) where (Ω,F , P ) is a probability space and T : Ω→ Ω is

a measurable, measure-preserving function are known as dynamical systems and are widely

studied in many contexts, not just in information theory. (In this general setting it is not

necessary that the sample space be of the form Ω = X Z.)

Defining a notion of entropy in this setting is complicated by the fact that Ω is uncount-

able, and furthermore that a given ω ∈ Ω will generally have measure 0. This is the motivation

to consider finite subalgebras. We say that a finite set B ⊂ F is a finite subalgebra of the

σ-algebra F if ∅ ∈ B and if B is closed under unions, intersections and taking complements.

If B is a finite subalgebra of F , then B is automatically a σ-algebra. For a finite subalgebra

B there exists a unique and finite set at(B) consisting of non-empty and pairwise disjoint

elements of B whose union is Ω, and such that every non-empty element of B can be uniquely

expressed as a union of elements of at(B). The elements of at(B) are known as the atoms of

B. (In fact, the atoms of B are those non-empty elements of B which have no proper subset

contained in B.) If B is a finite subalgebra, then TnB is a finite subalgebra for all n ∈ Z, and

at(TnB) = Tn at(B). We denote by A0 the ‘time-0’ finite subalgebra whose atoms are the

cylinders Ai = ω : ω0 = i, i ∈ X . For any k ∈ Z, it is clear T−kA0 is the finite subalgebra

of F with atoms ω : ωk = i, i ∈ X .

Let A be a set of arbitrary cardinality, and for each α ∈ A let Bα be a finite subalgebra

of F . We write∨α∈A Bα to denote the σ-algebra generated by ∪α∈ABα. We then have

F =∨∞n=−∞ T

nA0. We write∨ni=1 Bi = B1 ∨ . . .∨Bn. If B and C are finite subalgebras, then

so is B ∨ C, and it is clear that

at(B ∨ C) = B ∩ C : B ∩ C 6= ∅, B ∈ at(B), C ∈ at(C).

Let B ⊂ F be the finite subalgebra given by B =∨nk=1 T

−ikA0, with i1, . . . , in ∈ Z. The

atoms of B are then the cylinders

Bj1,...,jn = ω : ωi1 = j1, . . . , ωin = jn, with jk ∈ X for all k ∈ [n]. (5.1)

Because we can only ever observe a finite set of coordinates of any ω ∈ Ω, we can think of the

‘physical’ sets as those which are finite unions of cylinders. As in [4], we define the algebra


F0 by

F0 =∞⋃n=0

n∨i=−n

T iA0. (5.2)

If F ∈ F0, then F ∈∨ni=−n T

iA0 for some n ∈ N, and so F is a finite union of cylinders.

We can thus think of F0 as the collection of all ‘physical’ sets. If F1, F2 . . . ∈ F0, then⋃ni=1 Fi ∈ F0 for all n ∈ N, but it may hold that

⋃∞i=1 Fi /∈ F0. Thus, unlike F , we see

that F0 is not a σ-algebra. Note that if B ∈ F0, then TnB ∈ F0 for all n ∈ Z. We let S

denote the set of all subalgebras of the form∨nk=1 T

−ikA0 with i1, . . . , in ∈ Z. ‘Physical’ finite

subalgebras of F are those which are contained in F0, and we then observe that if B ⊆ F0 is

a finite subalgebra of F , then B ⊆ C for some C ∈ S.

The following standard definitions will later be generalised to the case of partial distin-

guishability, in the way that Shannon entropy is generalised by graph entropy.

Definition 5.1.1. ([4], [20].) We work in probability space (Ω,F , P ) as described. Let

B, C ⊂ F be finite subalgebras.

(i) We define the entropy of B by

H(B) = −∑

B∈at(B)

P (B) logP (B). (5.3)

(ii) The conditional entropy of C given B is defined by

H(C|B) =∑

B∈at(B)

P (B)∑

C∈at(C)

−P (C|B) logP (C|B), (5.4)

where P (C|B) = P (C∩B)/P (B) and the first summation ignores atoms of B of measure

0.

(iii) The entropy of B relative to T is given by

h(B, T ) = lim supn→∞

1

nH

(n−1∨k=0

T−kB

). (5.5)

(iv) The Kolmogorov–Sinai entropy of T is defined by

h(T ) = suph(B, T ) : B is a finite subalgebra of F. (5.6)

Remark 5.1.2. (i) Note that H(B) is equal to the Shannon entropy of the probability distri-

bution induced on at(B) by P .


(ii) The limit superior in the definition of h(B, T ) can be shown to be a limit; a proof using

conditional entropies is given in [4, p.81]. Alternatively, this follows from Fekete’s Lemma

using the method we will use to prove Proposition 5.3.15.

(iii) If P (B ∩ C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), then P (C|B) = P (C) and

H(C|B) = H(C).

We summarise some well-known properties of these quantities.

Theorem 5.1.3. ([4, A′5, B3]) For a finite subalgebra B ⊂ F and u, v ∈ Z with u ≤ v it

holds that

(i) H(T−uB) = H(B) and (ii) h

(v∨

k=u

T−kB, T

)= h(B, T ).

Theorem 5.1.4. ([4, A′2, B2 ]) If finite subalgebras B, C ⊂ F satisfy B ⊆ C, then

(i) H(B) ≤ H(C) and (ii) h(B, T ) ≤ h(C, T ).

Theorem 5.1.5. [4, A′1, A′4] For finite subalgebras B, C ⊂ F ,

H(B) +H(C|B) = H(B ∨ C) ≤ H(B) +H(C).

Now we recall the Kolmogorov–Sinai Theorem, a result which makes the computation of

h(T ) feasible in many cases.

Theorem 5.1.6. [4, Theorem 7.1] If a finite subalgebra B satisfies∨∞n=−∞ T

nB = F , then

h(T ) = h(B, T ).

It immediately follows for the time-0 subalgebra A0 that

h(T ) = h(A0, T ). (5.7)

Indeed, the next proposition shows that the supremum in Definition 5.1.1 (iv) is achieved by

any B ∈ S.

Proposition 5.1.7. It holds that h(B, T ) = h(T ) for all B ∈ S.

Proof. By Definition 5.1.1 (iv) h(B, T ) ≤ h(T ). For the reverse inequality, observe that if


B ∈ S, then for some n ∈ Z we have TnA0 ⊆ B, giving that

h(T ) = h(A0, T ) = h(TnA0, T ) ≤ h(B, T )

by (5.7), Theorems 5.1.3 (i) and 5.1.4 (ii).

Two well-known special cases deserve particular attention.

(i) Bernoulli shifts [20, Section 1.3, Example 15]. Given a probability distribution p = (pi)i∈X

on alphabet X , let

P (ω : ωk = ik,m ≤ k ≤ n) =n∏

k=m

pik (5.8)

for all m,n ∈ Z with ik ∈ X , k = m, . . . , n. Since any cylinder is a finite union of cylinders of

the form appearing on the left of (5.8), this suffices to give the measure of any cylinder, and

thus, by the Kolmogorov Existence Theorem, to define a probability measure P on (Ω,F)

which is preserved by T . In this case T is called the p-Bernoulli shift. An i.i.d. source clearly

corresponds to a Bernoulli shift. In the case of the p-Bernoulli shift, [4, (7.2)] gives that

h(T ) = H(p), (5.9)

where H(p) is the Shannon entropy of p.

(ii) Markov shifts [20, Section 1.3, Example 17]. Set |X | = d and let matrix Π = (pij) ∈

Md(R+) satisfy∑d

j=1 pij = 1 for all i = 1, . . . , d. Further, let row vector p = (pi) ∈M1,d(R+)

satisfy∑d

i=1 pi = 1 and pΠ = p; such a p is known as an invariant distribution. Again we

use the Kolmogorov Existence Theorem to specify a probability measure P on (Ω = X Z,F)

preserved by T by setting

P (ω : ωk = ik, ωk+1 = ik+1, . . . , ωl = il) = pikpikik+1. . . pil−1il . (5.10)

With P so defined, the shift T is called the (Π, p)-Markov shift. Note for any k ∈ Z that

P (ω : ωk = i) = pi and that P (ω : ωk = i, ωk+1 = j) = pipij ; thus pij is the probability

that ωn+1 is j, given that ωn was i. (Indeed, pij is the probability that ωn+1 is j, given any

previous history culminating in ωn = i.) We note that setting pij = pj for all i, j ∈ [d] gives

the p-Bernoulli shift. The Kolmogorov–Sinai entropy of the (Π, p)-Markov shift is given in

[4, (7.3)] by

h(T ) = −∑i,j∈[d]

pipij log pij . (5.11)

5.2 Graph theoretic preliminaries 180

The following definition describes the notion of isomorphism for systems (Ω,F , P, T ) and

(Ω′,F ′, P ′, T ′).

Definition 5.1.8. [20, Section 1.3, Definition 13] We say that the systems (Ω,F , P, T ) and

(Ω′,F ′, P ′, T ′) are isomorphic, or simply that T and T ′ are isomorphic, if there exists a

bijection φ : Ω→ Ω′ satisfying:

(i) for any A ⊆ Ω we have φ(A) ∈ F ′ if and only if A ∈ F , in which case P ′(φ(A)) = P (A);

and

(ii) for all ω ∈ Ω, it holds that φ(Tω) = T ′φ(ω).

In this case we write (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′), or simply T ∼= T ′, and φ is called an

isomorphism.

As isomorphic systems are in some ways equivalent, it would be desirable to have an en-

tropic quantity that is invariant under isomorphism; the Kolmogorov–Sinai entropy possesses

this property, as shown in the next theorem.

Theorem 5.1.9. [20, Section 1.3, Theorem 14] If (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′), then h(T ) =

h(T ′).

The following famous theorem of Ornstein ([33]), shows that among the Bernoulli shifts,

the converse of Theorem 5.1.9 also holds; we say that the Kolmogorov–Sinai entropy is a

complete invariant among the Bernoulli shifts.

Theorem 5.1.10. [33] The Bernoulli shifts T and T ′ are isomorphic if and only if h(T ) =

h(T ′).

5.2 Graph theoretic preliminaries

Here we give a number of definitions and results concerning co-normal and lexicographic

graph products and their graph entropies that will be needed in the sequel.

First we consider the co-normal product of graphs F and G. Let |V (F )| = c and |V (G)| =

d, and let S ⊆ V (F ) and T ⊆ V (G) have characteristic vectors v(S) and v(T ) respectively.

Then the characteristic vector of S×T is given by v(S×T ) = v(S)⊗v(T ) ∈ Rc⊗Rd, and we will


write (v(S×T ))(i,j) = (v(S))i(v(T ))j . Note that if S and T are stable in F and G respectively,

then S × T is stable in F ∗G.

By analogy with Definition 2.2.25, for A ⊆ Rd+ we define

her(A) = v ∈ Rd+ : ∃u ∈ A such that v ≤ u.

Definition 5.2.1. For convex corners A ⊆ Rm and B ⊆ Rn, write

A⊗max B = her(conv(a⊗ b : a ∈ A, b ∈ B)).

The next lemma concerns the vertex packing polytope of a co-normal product.

Lemma 5.2.2. For graphs F and G, it holds that

VP(F ∗G) = VP(F )⊗max VP(G).

Proof. Let c =∑

i civ(Si) ∈ VP(F ) and d =

∑j djv

(Tj) ∈ VP(G) where the Si and Tj are

stable sets of F and G respectively, and ci, dj ∈ R+ satisfy∑

i ci =∑

j dj = 1. Using that∑i,j cidj = 1 and cidj ≥ 0, it follows that

c⊗ d =∑ij

cidjv(Si) ⊗ v(Tj) =

∑ij

cidjv(Si×Tj) ∈ VP(F ∗G).

Now VP(F ∗G) is convex by definition and hereditary by Lemma 1.3.7, and thus

VP(F )⊗max VP(G) ⊆ VP(F ∗G).

For the reverse inclusion observe by Lemma 1.3.3 that each stable set of F ∗G is contained

in some kernel of the form S × T , where S and T are kernels of F and G respectively. Thus

for each v ∈ VP(F ∗ G), there exist kernels Si of F and kernels Tj of G, and coefficients

αi,j ∈ R+ satisfying∑

i,j αi,j = 1 such that

v ≤∑i,j

αi,jv(Si×Tj) =

∑i,j

αi,jv(Si) ⊗ v(Tj)

∈ conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)),

giving that v ∈ VP(F )⊗max VP(G).


Remark 5.2.3. Note that it can hold that

c⊗ d : c ∈ VP(F ), d ∈ VP(G) ( VP(F ∗G).

We offer the following example to illustrate this. Let F and G be complete graphs on vertex

sets f1, f2 and g1, g2 respectively. Then F has kernels F1 = f1 and F2 = f2, and G

has kernels G1 = g1 and G2 = g2. We then have that

v =1

2v(F1×G1) +

1

2v(F2×G2) =

1

2v(F1) ⊗ v(G1) +

1

2v(F2) ⊗ v(G2)

=1

2

(1, 0, 0, 1

)T∈ conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)) ⊆ VP(F ∗G).

However it is clear that v 6∈ c⊗ d : c ∈ VP(F ), d ∈ VP(G).

Furthermore, the following strict inclusion can hold

conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)) ( VP(F ∗G).

As an example, take both F and G to be vertex disjoint copies of the path graph on three

vertices, and form their co-normal product F ∗G as below:

• • • •

F ∗G = • ∗ • • • = • • •

• • • •

Being the characteristic vector of a stable set in F ∗G,

v =(

1, 0, 1, 0, 0, 0, 1, 0, 0)T∈ VP(F ∗G),

but it is easily seen that v /∈ conv(a⊗ b : a ∈ VP(F ), b ∈ VP(G)).

Later in the chapter the concept of the lexicographic graph product will be needed.

Definition 5.2.4. ([15, p.17].) For graphs F and G, the lexicographic product F ·G is the

graph with vertex set V (F )× V (G) and in which (i1, j1) ∼ (i2, j2) if and only if i1 ∼ i2 in F

or i1 = i2 and j1 ∼ j2 in G.

Note in general that F ·G G ·F . It is clear that F ·G is a spanning subgraph of F ∗G,


and that F ·G = F ∗G if and only if F is a complete graph or G is an empty graph.

Lemma 5.2.5. (See [14, Theorem 1].) If K ⊆ V (F ·G), then K is a kernel of F ·G if and

only if K =⋃i∈Si × Ti where S is a kernel of F , and Ti is a kernel of G for each i ∈ S.

Furthermore, α(F ·G) = α(F )α(G).

Proof. That⋃i∈Si × Ti is a kernel is clear. To show the converse we note that if K is a

kernel, then the projection of K onto V (F ) is contained in some kernel S of F . Furthermore,

each element of K with first coordinate i must have second coordinate in some kernel Ti of

G, giving K ⊆⋃i∈Si × Ti. Since K is maximally stable, it follows that K =

⋃i∈Si × Ti.

Now note that if S, T are kernels of F,G respectively, then S × T is a kernel of F · G, and

so α(F · G) ≥ α(F )α(G). However, if kernel K =⋃i∈Si × Ti, then |K| =

∑i∈S |Ti| ≤

α(F )α(G), completing the proof.

We will use the substitution lemma as outlined in [49] and first proved in [25]. Let F and

G be vertex disjoint graphs and v ∈ V (F ). By substituting G for v, as defined in [9, Section

5], is meant deleting from F the vertex v and all its incident edges, and then adding edges

from every vertex of G to those vertices of F which were adjacent to v in F . We denote

the resulting graph by Fv←G. Let p and q be probability distributions on V (F ) and V (G)

respectively. We define the probability distribution pv←q on V (Fv←G) = V (G) ∪ V (F )\v

by

pv←q(x) =

p(x) if x ∈ V (F )\v

p(v)q(x) if x ∈ V (G).

(5.12)

Lemma 5.2.6 (Substitution lemma). ([49, Lemma 3.3], [25].) With the notation above, it

holds that

H(Fv←G, pv←q) = H(F, p) + p(v)H(G, q).

In [9, Section 5] it is shown how repeatedly substituting copies of a graph G for the

vertices of a graph F produces the graph F · G. The lemma below uses this technique to

find the graph entropy of the graph F · G with an arbitrary joint probability distribution r

on V (F ) × V (G). Recall that if r is a probability distribution on V (F ) × V (G), then the

marginal distribution of r on V (F ) is given by p(i) =∑

j∈V (G) r(i, j).

Lemma 5.2.7. For any graphs F and G and probability distribution r on V (F ·G), we have

H(F ·G, r) = H(F, p) +∑

v∈V (F )

p(v)H(G, r(·|v)),


where p is the marginal distribution of r on V (F ), and r(·|v) denotes the conditional proba-

bility distribution on V (G) given v ∈ V (F ), where r(x|v) = r(v,x)p(v) for x ∈ V (G).

Proof. We take graph F with probability distribution p on V (F ) and |V (F )| vertex disjoint

copies of G, whose vertex sets are each given the probability distribution r(·|v) for a different

v ∈ V (F ). Now, for each v ∈ V (F ) in turn, we substitute for v the copy of G with probability

distribution r(·|v). It is clear the resulting probabilistic graph is (F ·G, r). Lemma 5.2.6 then

gives the required result.

Remark 5.2.8. We note that Lemma 5.2.7 can also be seen to follow from [53, Proposition

4.4].

Lemma 5.2.9. For graphs F and G and probability distribution r on V (F )×V (G), we have

H(F ∗G, r) ≥ H(F ·G, r).

Proof. Given that F ·G is a spanning subgraph of F ∗G, this follows from Lemma 1.3.19.

Lemma 5.2.10. If r is a probability distribution on V (F )×V (G) with marginal distributions

p and q on V (F ) and V (G) respectively, then

H(F ∗G, r) ≤ H(F, p) +H(G, q).

Proof. Let a = (ai)i∈V (F ) ∈ VP(F ) satisfy −∑

i∈V (F ) p(i) log ai = H(F, p) and let b =

(bj)j∈V (G) ∈ VP(G) satisfy −∑

j∈V (G) q(j) log bj = H(G, q). By Lemma 5.2.2, a ⊗ b ∈

VP(F ∗G), and so

H(F ∗G, r) ≤−∑

(i,j)∈V (F∗G)

r(i, j) log(a⊗ b)(i,j)

=−∑

(i,j)∈V (F∗G)

r(i, j) log ai −∑

i,j∈V (F∗G)

r(i, j) log bj

=−∑

i∈V (F )

p(i) log ai −∑

j∈V (G)

q(j) log bj

= H(F, p) +H(G, q).

We note that Lemma 5.2.10 also follows from [53, Proposition 3.7].

Combining Lemmas 5.2.7, 5.2.9 and 5.2.10 gives the following proposition.


Proposition 5.2.11. If F and G are graphs and r is a probability distribution on V (F ) ×

V (G) with marginal distributions p and q on V (F ) and V (G) respectively, then

H(F ·G, r) = H(F, p) +∑

v∈V (F )

p(v)H(G, r(·|v))

≤ H(F ∗G, r) ≤ H(F, p) +H(G, q).

If the probability distribution r on V (F ) × V (G) is given by r(i, j) = p(i)q(j) for the

marginal probability distributions p and q on V (F ) and V (G) respectively, we say that r is

a product distribution, and we write r = p × q. The next result shows that in the case of

a product distribution, equality holds throughout in Proposition 5.2.11. (We note that the

second equality in the Proposition below is equivalent to [11, Theorem 5.1].)

Proposition 5.2.12. If r is the product distribution on V (F )× V (G) given by r = p× q for

probability distributions p and q on V (F ) and V (G) respectively, then

H(F ·G, r) = H(F ∗G, r) = H(F, p) +H(G, q).

Proof. In this case the conditional probability distribution r(·|i) on V (G) given i ∈ V (F )

satisfies r(j|i) = q(j). For each v ∈ V (F ) we then have H(G, r(·|v)) = H(G, q). Thus in

Proposition 5.2.11 we have∑

v∈V (F ) p(v)H(G, r(·|v)) = H(G, q) and the result follows.

Remark 5.2.13. In [20, Section 1.1, Theorem 1] we have the equivalent result for Shannon

entropies, namely that

H(r) = H(p) +∑v

p(v)H(r(·|v)) ≤ H(p) +H(q),

where equality holds if and only if r is a product distribution. Note, however, that it is not

necessary that r be a product distribution for equality to hold throughout in Proposition

5.2.11. To show this we offer the following example where r 6= p × q, but H(G, r(·|v)) =

H(G, q) for each v ∈ V (F ), whence equality in Proposition 5.2.11 immediately follows. Take

F = K2 with V (F ) = x, y and p(x) = p(y) = 12 . Then let G be the cycle C4 with

V (G) = [4] and let the conditional probability distribution on V (G) given v ∈ V (F ) be r(·|v)

where (r(i|x))4i=1 = (1

4 + ε, 14 ,

14 − ε,

14)T and (r(i|y))4

i=1 = (14 − ε,

14 ,

14 + ε, 1

4)T for 0 < ε < 14 .

Thus q = (14 ,

14 ,

14 ,

14)T , but r 6= p × q. Here G has kernels 1, 3 and 2, 4 and so, for any

5.3 Graph entropy for the source with memory 186

distribution s = (si)i∈[4] on V (G), by Remark 1.3.13

H(G, s) = minα∈[0,1]

−

4∑i=1

si log ai : a = (α, 1− α, α, 1− α)T

.

We now note for any a of the form given above that

−4∑i=1

r(i|x) log ai = −4∑i=1

r(i|y) log ai = −4∑i=1

qi log ai.

We can therefore conclude that H(G, r(·|x)) = H(G, r(·|y)) = H(G, q), giving equality in

Proposition 5.2.11 as claimed.

Corollary 5.2.14. If Gn, the nth co-normal power of G, has the probability distribution pn

on its vertex set given by

pn(i0, . . . , in−1) =

n−1∏k=0

pik ,

where (pi)i∈V (G) is a probability distribution on V (G), then

H(Gn, pn) = nH(G, p).

Proof. Apply an induction argument to Proposition 5.2.12.

Remark 5.2.15. This also follows directly from the expression for graph entropy given in

Definition 1.3.5:

H(Gk, pk) = limn→∞

1

nlog min

χ(GnkE ) : E ⊆ X nk, pnk(E) > 1− λ

= lim

N→∞

k

Nlog min

χ(GNE ) : E ⊆ XN , pN (E) > 1− λ

= kH(G, p),

where λ ∈ (0, 1).

5.3 Graph entropy for the source with memory

The necessary background is now in place for us to generalise the theory of the source with

memory as described in Section 5.1 to the situation of partial distinguishability. As there we

work in probability space (Ω = X Z,F , P ) where X is a fixed, finite alphabet, and F is the

σ-algebra on Ω generated by the cylinders.

First it is necessary to formalise the concept of distinguishability.


Definition 5.3.1. We take distinguishability to be a symmetric but not necessarily transitive

relation on Ω = X Z, and we construct an infinite graph G, known as the distinguishability

graph on Ω, with V (G) = Ω. For ω, ω′ ∈ V (G) we set ω ∼ ω′ in G when ω and ω′ are

distinguishable. Sets A,B ⊆ Ω are said to be distinguishable when a ∼ b for all a ∈ A

and b ∈ B, or equivalently when A × B ⊆ E(G). (Note that distinguishable sets are then

necessarily disjoint.) If all distinct ω, ω′ ∈ Ω are distinguishable, then we say graph G is

complete, and then all disjoint subsets of Ω are distinguishable.

Recall it was required that the shift transformation T be measure preserving, in order

that probabilities are unchanged by a shift of the time origin. In the same way we desire

that distinguishability is unaffected by a shift of the time origin; that is, we require that

ω ∼ ω′ in G if and only if Tω ∼ Tω′ in G. With this condition satisfied we say that T is

distinguishability preserving. If T is distinguishability preserving, then for all n ∈ Z,

ω ∼ ω′ in G ⇐⇒ Tnω ∼ Tnω′ in G. (5.13)

When (5.13) holds, we also say that graph G is shift invariant. Throughout the sequel it is

assumed that P is a probability measure on (Ω,F) and G a distinguishability graph on Ω

such that T is both measure preserving and distinguishability preserving.

5.3.1 The graph G[B] and its graph entropy

In Section 5.1 progress was made by considering atoms of finite subalgebras of F , and it

seems natural to use a similar approach here.

Definition 5.3.2. For a finite subalgebra B ⊂ F and a graph G as defined in Definition

5.3.1, we define G[B] to be the graph with vertex set at(B) = B1, . . . , Bk, and such that

Bi ∼ Bj in G[B] when Bi ×Bj ⊆ E(G), that is when Bi and Bj are distinguishable.

If B ⊂ F is a finite subalgebra, then a probability measure P on (Ω,F) induces a prob-

ability distribution on V (G[B]) = at(B), and we denote the resulting probabilistic graph by

(G[B], P ). The graph entropy H(G[B], P ) is now defined as in Section 1.3. If the atoms of B

are all mutually distinguishable, that is G[B] ∼= K| at(B)|, then (1.21) on page 19 and Remark

5.1.2 give that

H(G[B], P ) = H(B), (5.14)

and H(G[B], P ) is equal to the Shannon entropy of the probability distribution on at(B)


induced by P . In general Lemma 1.3.19 and (1.3) on page 3 give that

H(G[B], P ) ≤ H(B) ≤ log | at(B)|. (5.15)

The next two straightforward propositions generalise Theorems 5.1.3 (i) and 5.1.4 (i)

respectively to establish some basic properties of H(G[B], P ).

Proposition 5.3.3. For any finite subalgebra B ⊂ F and n ∈ Z, we have

H(G[TnB], P ) = H(G[B], P ).

Proof. We have V (G[B]) = at(B) and V (G[TnB]) = at(TnB) = Tn at(B). Now let B,C ∈

at(B). Since T preserves distinguishability, it holds that B ∼ C in G[B] if and only if

TnB ∼ TnC in G[TnB], and graphs G[B] and G[TnB] are isomorphic. Furthermore, T

preserves measure, and hence P (TnB) = P (B) for all B ∈ at(B). The result follows.

Proposition 5.3.4. If B, C ⊂ F are finite subalgebras satisfying B ⊆ C, then

H(G[B], P ) ≤ H(G[C], P ).

Proof. As B ⊆ C, each atom of B is a union of atoms of C. Lemma 1.3.27 shows that null

vertices can be ignored, so without loss of generality choose B ∈ at(B) with P (B) > 0 and let

B =⋃nj=1Cj where C1, . . . , Cn ∈ at(C). Let F be the empty graph with vertices C1, . . . , Cn.

We give F the probability distribution Q on its vertices, where Q(Cj) =P (Cj)

P (B) . In graph G[B]

we substitute the graph F for vertex B to form the graph G [B]B←F ; note from (5.12) that

the substitution algorithm gives to the vertex set of this graph the probability distribution

induced by P . We apply Lemma 5.2.6 to yield

H (G [B]B←F , P ) = H(G[B], P ).

Observe for any D ⊆ Ω that if B ×D ⊆ E(G) then Cj ×D ⊆ E(G) for all j = 1, . . . , n, so

the edges created by the substitution are between distinguishable elements of F . Repeating

for each atom of B thus yields a spanning subgraph of G[C] with graph entropy equal to that

of G[B]. The result follows by the monotonicity result in Lemma 1.3.19.

Corollary 5.3.5. For all finite subalgebras B, C ⊂ F ,

H(G[B ∨ C], P ) ≥ H(G[C], P ).


Proof. Since C ⊆ B ∨ C, this follows immediately from Proposition 5.3.4.

Remark 5.3.6. Although Definition 5.3.2 and the results above apply to any finite subalgebra

B ⊂ F and distinguishability graph G on Ω, we will often wish to impose further conditions:

(i) We will often specify that G, the distinguishability graph on Ω, arises from a distinguisha-

bility relation on X as follows. Let G0 be a distinguishability graph on the alphabet X as

described in Section 1.3. For ω, ω′ ∈ Ω we set ω ∼ ω′ in G if and only if ωk ∼ ω′k in G0

for some k ∈ Z. The distinguishability graph G on Ω is then given by the infinite co-normal

product G = . . . ∗G0 ∗G0 ∗G0 ∗ . . ., which we will denote by GZ0 . If G = GZ0 , it is clear that

G[TnA0] ∼= G0 for all n ∈ Z, where A0 is the time-0 subalgebra.

(ii) It may be desired to consider finite subalgebras of a more specific form. In Section 5.1

it was argued that the ‘physical’ case concerns finite subalgebras which are contained in

the algebra F0 =⋃∞n=0

∨ni=−n T

iA0. We narrow our focus further to examine the set S of

finite subalgebras of the form∨nk=1 T

−ikA0. These subalgebras are of particular physical

significance because if B ∈ S, then at(B) will be the set of cylinders of the form given

in (5.1). Let G = GZ0 and consider finite subalgebra B =∨nk=1 T

−ikA0 ∈ S with atoms

Bj1,...,jn = ω : ωi1 = j1, . . . , ωin = jn, with jk ∈ X for all k ∈ [n]. It is clear that

Bj1,...,jn × Bk1,...,kn ⊆ E(G) if and only if jl ∼ kl in G0 for some l ∈ [n]. We say that sets

A,B ⊆ Ω are distinguishable on coordinate j ∈ Z if ωj ∼ ω′j in G0 for all ω ∈ A and ω′ ∈ B.

Thus, if B ∈ S and G = GZ0 , then A,B ∈ at(B) satisfy A ∼ B in G[B] if and only if A and

B are distinguishable on at least one coordinate.

For B,C ∈ F where P (B) 6= 0, we write P (C|B) = P (B∩C)P (B) , and note that P (·|B) is then

a probability measure on (Ω,F).

Definition 5.3.7. By analogy with the definition of the conditional entropy H(C|B), we

define the conditional graph entropy of C given B by

H(G[C|B], P ) =∑

B∈at(B):P (B)>0

P (B)H(G[C], P (·|B)).

Note that if P (B ∩C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), we have P (C|B) =

P (C) and H(G[C|B], P ) = H(G[C], P ).

We will call B =∨nk=1 T

−ikA0 ∈ S the subalgebra over coordinate set SB = i1, . . . , in.

It is clear that when B, C ∈ S, then B ∨ C ∈ S and SB∨C = SB ∪ SC . Under the ‘physical’

conditions of Remark 5.3.6, we have the following analogue of Theorem 5.1.5.


Proposition 5.3.8. When B, C ∈ S and G = GZ0 ,

H(G[B], P ) +H(G[C|B], P ) ≤ H(G[B ∨ C], P ) ≤ H(G[B], P ) +H(G[C], P ). (5.16)

Furthermore, if P (B ∩C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), then equality holds

throughout in (5.16).

Proof. The atoms of B∨C, and hence the vertices of G[B∨C], are the non-empty intersections

B ∩ C where B ∈ at(B) and C ∈ at(C). Whereas,

V (G[B] ∗G[C]) = (B,C) : B ∈ at(B), C ∈ at(C).

We equip V (G[B] ∗G[C]) with the probability distribution r where

r((B,C)) = P (B ∩ C). (5.17)

We now show that (G[B]∗G[C], r) and (G[B∨C], P ) are related in the way (G, p) and (G′, p′)

are related in Lemma 1.3.27.

Any vertex (B,C) of G[B] ∗ G[C] satisfying B ∩ C = ∅ will have measure r((B,C)) =

P (B ∩ C) = 0. As in Lemma 1.3.27, we form (G[B] ∗ G[C])′ by deleting from G[B] ∗ G[C]

all vertices (B,C) with B ∩ C = ∅, along with their incident edges. We identify vertex

(B,C) ∈ V ((G[B] ∗ G[C])′) with vertex B ∩ C ∈ V (G[B ∨ C]) to put V ((G[B] ∗ G[C])′) and

V (G[B ∨ C]) in a natural one-to-one correspondence. Let Bi, Bj ∈ at(B) and Ck, Cl ∈ at(C)

satisfy Bi ∩ Ck 6= ∅ and Bj ∩ Cl 6= ∅. We claim the following are equivalent:

(1) (Bi, Ck) ∼ (Bj , Cl) in (G[B] ∗G[C])′;

(2) (Bi ∩ Ck) ∼ (Bj ∩ Cl) in G[B ∨ C].

To see this, first suppose that (1) holds. Then either Bi ∼ Bj in G[B], giving Bi ×

Bj ⊆ E(G), or Ck ∼ Cl in G[C], giving Ck × Cl ⊆ E(G). This is sufficient to show that

(Bi ∩ Ck)× (Bj ∩ Cl) ⊆ E(G), and hence (2) holds.

To prove the reverse implication, let B, C ∈ S be the subalgebras over the coordinate sets

SB and SC respectively. Then B ∨ C is the subalgebra over SB ∪ SC . Then (2) implies that

(Bi∩Ck)× (Bj ∩Cl) ⊆ E(G), and as noted in Remark 5.3.6 (ii), there exists t ∈ SB∪SC such

that Bi ∩ Ck and Bj ∩ Cl are distinguishable on coordinate t. If t ∈ SB, we have Bi ∼ Bj

in G[B], and if t ∈ SC , we have Ck ∼ Cl in G[C]. In either case, (Bi, Ck) ∼ (Bj , Cl) in

(G[B] ∗G[C])′, and (1) holds, as required.


Thus the graphs (G[B] ∗ G[C])′ and G[B ∨ C] are isomorphic, and recalling (5.17) and

regarding r as a probability distribution on V ((G[B] ∗G[C])′), we have

H((G[B] ∗G[C])′, r) = H(G[B ∨ C], P ).

Lemma 1.3.27 then gives

H(G[B ∨ C], P ) = H(G[B] ∗G[C], r). (5.18)

Note that the marginal distributions of r on V (G[B]) = at(B) and on V (G[C]) = at(C)

are both those induced by P and that r(C|B) = P (C|B). The result then follows from

Propositions 5.2.11 and 5.2.12, using Definition 5.3.7.

Remark 5.3.9. Note that the equality of Theorem 5.1.5 becomes an inequality in Proposition

5.3.8. This results in conditional graph entropy lacking some of the useful properties, and

hence also the applications, of conditional entropy; see Remark 5.3.21.

Although in the proof of Proposition 5.3.8 it holds that (1) ⇒ (2) without the conditions

B, C ∈ S and G = GZ0 , a simple counter example shows that in general the reverse implication

does not apply. Let Ω = X Z where X = 1, 2, 3, 4. We define finite subalgebras B, C /∈ S by

at(B) = B1, B2, at(C) = C1, C2 where

B1 = ω : ω0 = 1 or 2, B2 = ω : ω0 = 3 or 4,

C1 = ω : ω0 = 1 or 3, C2 = ω : ω0 = 2 or 4.

Let G = GZ0 where G0 is the graph below.

• 4

• 2

• 3

• 1

We have that B1 B2 in G[B] and C1 C2 in G[C], and so

G[B] ∼= G[C] ∼= K2, and G[B] ∗G[C] ∼= K4.


(As no Bi ∩ Cj = ∅, we need not delete any vertices to form (G[B] ∗G[C])′.) However,

B1 ∩ C1 = ω : ω0 = 1, B1 ∩ C2 = ω : ω0 = 2,

B2 ∩ C1 = ω : ω0 = 3, B2 ∩ C2 = ω : ω0 = 4,

and thus the graph G[B ∨ C] is as shown below.

•B2 ∩ C1

•B1 ∩ C2

•B2 ∩ C2

•B1 ∩ C1

By defining a new type of infinite product graph, we give a further example of where G[B∨C]

and G[B] ∗G[C] differ.

Definition 5.3.10. For given distinguishability graph G0 on X , let the threshold-t co-normal

product graph GZ,t0 have vertex set Ω = X Z and satisfy ω ∼ ω′ in GZ,t0 when ωi ∼ ω′i in G0

for at least t distinct coordinates.

(It is clear that the graph GZ,t0 is shift invariant. Also note that GZ,10 = GZ0 .)

Now take G = GZ,20 , and let X = a, b, c and G0 be the path graph below:

•a •b •c

Clearly G[A0] ∼= G[T−1A0] ∼= K3, and thus G[A0] ∗G[T−1A0] ∼= K9. However, A0 ∨ T−1A0

has atoms of the form ω : ω0 = i, ω1 = j with i, j ∈ X and we note, for instance, that

ω : ω0 = ω1 = a ∼ ω : ω0 = ω1 = b, and hence G[A0 ∨ T−1A0] is non-empty. Indeed,

letting the vertex labelled ij denote the atom ω : ω0 = i, ω1 = j, it is easy to verify that

G[A0 ∨ T−1A0] is as shown:

•aa •ab •ac

•ba •bb •bc

•ca •cb •cc


5.3.2 The quantity h(G[B], T ) and its properties

We now generalise the quantity h(B, T ) as defined in Definition 5.1.1 (iii) to the context of a

distinguishability graph G on Ω in probability space (Ω,F , P ).

Definition 5.3.11. We define the graph entropy of finite subalgebra B ⊂ F relative to T by

h(G[B], T ) = lim supn→∞

1

nH

(G

[n−1∨k=0

T−kB

], P

).

The next two propositions establish a monotonicity condition and an upper bound on

h(G[B], T ). (Proposition 5.3.12 can be seen as an analogue of Theorem 5.1.4 (ii).)

Proposition 5.3.12. If B ⊆ C, then h(G[B], T ) ≤ h(G[C], T ).

Proof. If B ⊆ C, then∨k−1i=0 T

−iB ⊆∨k−1i=0 T

−iC, and by Proposition 5.3.4

H

(G

[n−1∨k=0

T−kB

], P

)≤ H

(G

[n−1∨k=0

T−kC

], P

)

for all n ∈ N. The result follows by Definition 5.3.11.

Proposition 5.3.13. For any finite subalgebra B ⊂ F and n ∈ N, we have

h(G[B], T ) ≤ h(B, T ) ≤ log | at(B)|.

Proof. Noting that | at(∨n−1k=0 T

−kB)| ≤ | at(B)|n, we apply (5.15) on page 188 to give

H

(G

[n−1∨k=0

T−kB

], P

)≤ H

(n−1∨k=0

T−kB

)≤ n log | at(B)|,

whence the result follows on dividing by n and taking limits superior.

Remark 5.3.14. If G is complete, then all the atoms of the finite subalgebra B ⊂ F are

mutually distinguishable, and by (5.14) on page 187 we have H(G[∨n−1

k=0 T−kB

], P)

=

H(∨n−1

k=0 T−kB

)for all n ∈ N. We refer to this as the ‘complete case’, when it is easily seen

that h(G[B], T ) = h(B, T ).

We now use Fekete’s Lemma to show that the limit superior in Definition 5.3.11 becomes

a limit under conditions (i) and (ii) of Remark 5.3.6.


Proposition 5.3.15. For a finite subalgebra B ∈ S and G = GZ0 we have

h(G[B], T ) = limn→∞

1

nH

(G

[n−1∨k=0

T−kB

], P

).

Proof. Setting am = H(G[∨m−1k=0 T

−kB], P ), we have

am+n = H

(G

[m+n−1∨k=0

T−kB

], P

)

= H

(G

[(m−1∨k=0

T−kB

)∨

(m+n−1∨k=m

T−kB

)], P

)

≤ H

(G

[m−1∨k=0

T−kB

], P

)+H

(G

[T−m

(n−1∨k=0

T−kB

)], P

)

= H

(G

[m−1∨k=0

T−kB

], P

)+H

(G

[n−1∨k=0

T−kB

], P

)

= am + an,

where we have used Propositions 5.3.8 and 5.3.3. Thus the sequence (an)n∈N is sub-additive

and the result follows from Lemma 3.1.6.

Under the same ‘physical’ conditions, the sub-additivity of h(G[B], T ) follows from that

of H(G[B], P ) as given in Proposition 5.3.8.

Proposition 5.3.16. When B, C ∈ S and G = GZ0 ,

h(G[B ∨ C], T ) ≤ h(G[B], T ) + h(G[C], T ).

Proof. Applying Proposition 5.3.8 gives

H

(G

[n−1∨k=0

T−k(B ∨ C)

], P

)= H

(G

[n−1∨k=0

T−kB ∨n−1∨k=0

T−kC

], P

)

≤ H

(G

[n−1∨k=0

T−kB

], P

)+H

(G

[n−1∨k=0

T−kC

], P

).

The result follows on dividing by n and taking limits.

The proof of Theorem 5.1.3 (ii) given in [4, B3] can easily be extended to the graph setting

to yield the next lemma, which will be used in our generalisation of the Kolmogorov–Sinai

theorem.


Lemma 5.3.17. For any finite subalgebra B ⊂ F and u, v ∈ Z with u ≤ v, it holds that

h

G v∨j=u

T−jB

, T = h(G[B], T ).

Proof. First note that

n−1∨i=0

T−i

v∨j=u

T−jB

=

v∨j=u

T−jB

∨ v∨j=u

T−j−1B

∨ . . . ∨ v∨j=u

T−j−n+1B

=T−u

(n+v−u−1∨

k=0

T−kB

).

Then

H

Gn−1∨i=0

T−i

v∨j=u

T−jB

, P =H

(G

[T−u

(n+v−u−1∨

k=0

T−kB

)], P

)

=H

(G

[n+v−u−1∨

k=0

T−kB

], P

),

where we have used Proposition 5.3.3. Then

1

nH

Gn−1∨i=0

T−i

v∨j=u

T−jB

, P

=n+ v − u

n

1

n+ v − uH

(G

[n+v−u−1∨

k=0

T−kB

], P

),

and taking limits superior as n→∞ yields the result. (Recall that when limn→∞(an) exists,

lim supn→∞(anbn) = limn→∞(an) lim supn→∞(bn).)

5.3.3 Generalising the Kolmogorov–Sinai Theorem

We now define a quantity h(G,T ) which generalises the Kolmogorov–Sinai entropy h(T ) to the

graph setting. We will discuss the Bernoulli and Markov shifts, and a notion of isomorphism

will be introduced, under which we will establish the invariance of h(G,T ).

Definition 5.3.18. Given a graph G on Ω and probability space (Ω,F , P ), we define the

graph entropy of the shift T by

h(G,T ) = suph(G[B], T ) : B ⊆ F0 is a finite subalgebra of F.


The reader will notice that, although h(T ) was defined by a supremum over finite subal-

gebras of F , we have defined h(G,T ) by a supremum over finite subalgebras of F contained

in F0, as given in (5.2) on page 177. The exclusion of finite subalgebras not in F0 will be dis-

cussed in Remark 5.3.21 and Question 5.4.1. For now, note that since the time-0 subalgebra

A0 ⊂ F0 and since (5.7) on page 178 gives h(A0, T ) = h(T ), it is certainly true that

h(T ) = suph(B, T ) : B ⊆ F0 is a finite subalgebra of F.

(It can also be pointed out that, in only considering finite subalgebras of F , the definition of

h(T ) also excludes many finite subalgebras; here we have simply enlarged the set of excluded

subalgebras.)

The Kolmogorov–Sinai Theorem (Theorem 5.1.6) has the following analogue in this set-

ting.

Theorem 5.3.19. Suppose a finite subalgebra B ⊆ F0 satisfies

∞⋃n=0

n∨k=−n

T kB = F0.

Then h(G,T ) = h(G[B], T ).

Proof. It is clearly sufficient to prove that h(G[C], T ) ≤ h(G[B], T ) for any finite subalgebra

C ⊂ F0. To this end, observe that if a finite subalgebra C ⊂ F0, then C ⊆∨nk=−n T

kB for

some n ∈ N. Proposition 5.3.12 and Lemma 5.3.17 then give that

h(G[C], T ) ≤ h

(G

[n∨

k=−nT kB

], T

)= h(G[B], T ),

as required.

The next result, analogous to Proposition 5.1.7, shows that the supremum in Definition

5.3.18 is achieved by any element of S; this makes the calculation of h(G,T ) feasible in some

cases.

Proposition 5.3.20. It holds that h(G[A0], T ) = h(G,T ) where A0 is the time-0 subalgebra.

Furthermore, h(G[B], T ) = h(G,T ) for all B ∈ S.

Proof. The first assertion is immediate from Theorem 5.3.19 and the definition of F0 in

(5.2). Then note for arbitrary B ∈ S that we have B ⊂ F0, and so by Definition 5.3.18


h(G[B], T ) ≤ h(G,T ). However, we also have that TnA0 ⊆ B for some n ∈ Z, and so

h(G,T ) = h(G[A0], T ) = h(G[TnA0], T ) ≤ h(G[B], T )

by Lemma 5.3.17 and Proposition 5.3.12, and the proof is complete.

Remark 5.3.21. The motivation for the exclusion of subalgebras not in F0 in the definition of

h(G,T ) is now clear; our proof of Theorem 5.3.19 does not yield an upper bound on h(G[C], T )

for a finite subalgebra C ⊂ F not contained in F0. The standard proof of the Kolmogorov–

Sinai Theorem [4, Theorem 7.1] uses properties of conditional entropies to consider the general

case of C ⊂ F , but the issue mentioned in Remark 5.3.9, where we noted that the equality

in Theorem 5.1.5 becomes an inequality in the graph case, seems to preclude employing an

equivalent strategy in this case.

By Proposition 5.3.13, for any distinguishability graph G on Ω, we have

h(G,T ) ≤ h(T ). (5.19)

In the case that G is complete, Remark 5.3.14 and (5.7) on page 178 give that h(G[A0], T ) =

h(A0, T ) = h(T ) and hence in the complete case

h(G,T ) = h(T ). (5.20)

Thus, as would be expected, when G is complete, the situation reduces to that considered in

Section 5.1.

Let p = (pi)i∈X be the probability distribution on X given by pi = P (ω : ω0 = i) for

each i ∈ X . Trivially, H(A0) = H(p) and if G = GZ0 , we have H(G[A0], P ) = H(G0, p). By

Theorems 5.1.5 and 5.1.3,

H

(n−1∨i=0

T−iA0

)≤

n−1∑i=0

H(T−iA0) = nH(A0) = nH(p),

whence by (5.7) and Definition 5.1.1 (iii) we have,

h(T ) = h(A0, T ) ≤ H(p).

It is straightforward to generalise these ideas to the graph case.

Lemma 5.3.22. When G = GZ0 and p = (pi)i∈X is the probability distribution on X given


by pi = P (ω : ω0 = i), we have

h(G,T ) = h(G[A0], T ) ≤ H(G0, p).

Proof. By Proposition 5.3.8 and Proposition 5.3.3, we have

H

(G

[n−1∨k=0

T−kA0

], P

)≤ nH(G[A0], P ) = nH(G0, p),

whence Proposition 5.3.20 and Definition 5.3.11 complete the proof.

The following proposition generalises (5.9) on page 179 and shows that if T is the p-

Bernoulli shift, then equality holds in Lemma 5.3.22, and h(G,T ) reduces to the graph entropy

of the probabilistic graph (G0, p). (As a Bernoulli shift corresponds to an i.i.d. source, this

is as would intuitively be expected.)

Proposition 5.3.23. Let G = GZ0 , and let T be the p-Bernoulli shift where p is a probability

distribution on V (G0). Then h(G,T ) = H(G0, p).

Proof. The proof proceeds as for Lemma 5.3.22, but (5.8) on page 179 shows that in this case

the condition for equality in Proposition 5.3.8 is fulfilled.

We now consider the graph entropy of the (Π, p)-Markov shift to generalise (5.11) on page

179.

Proposition 5.3.24. Let G = GZ0 and T be the (Π, p)-Markov shift. Set X = [d] and let µ(i)

denote the probability distribution given by the ith row of Π = (pij), that is µ(i)j = pij. Then

d∑i=1

piH(G0, µ(i)) ≤ h(G,T ) ≤ H(G0, p).

Proof. The upper bound on h(G,T ) is given by Lemma 5.3.22. For the lower bound, we write

An =∨nk=0 T

−kA0 and for n ≥ 1 apply Proposition 5.3.8 to An = An−1 ∨ T−nA0 to give

H(G[An−1], P ) +∑

A∈at(An−1)

P (A)H(G[T−nA0], P (·|A)

)≤ H(G[An], P ). (5.21)

Now if X = ω : ωn = j ∈ at(T−nA0) and ω : ωn−1 = i ⊇ A ∈ at(An−1), then (5.10) on

page 179 gives P (X|A) = pij = µ(i)j and so

H(G[T−nA], P (·|A)

)= H(G0, µ

(i)).


In (5.21) for each i ∈ [d] we sum over those A ∈ at(An−1) contained in ω : ωn−1 = i to

leave

H(G[An−1], P ) +

d∑i=1

piH(G0, µ(i)) ≤ H(G[An], P ).

Rearranging gives

d∑i=1

piH(G0, µ(i)) ≤ H(G[An], P )−H(G[An−1], P ),

and summing over n from 1 to N − 1 yields

(N − 1)

d∑i=1

piH(G0, µ(i)) ≤ H(G[AN−1], P )−H(G[A0], P ).

Using that h(G,T ) = h(G[A0], T ) = limN→∞1NH(G[AN−1], P ) yields the required result.

Remark 5.3.25. Note that setting pij = pj for all i, j ∈ [d] gives µ(i) = p for all i and we have

the p-Bernoulli shift. In this case Proposition 5.3.24 immediately gives h(G,T ) = H(G0, p),

agreeing with Proposition 5.3.23.

We have discussed the concept of isomorphism for systems of the form (Ω,F , P, T ); we

now extend this to systems of the form (Ω = X Z,F , P, T,G). First we define an isomorphism

in this setting, and we then show that h(G,T ) is invariant under such isomorphisms. (For

the system (Ω′ = X ′Z,F ′, P ′, T ′, G′) we denote the time-0 subalgebra by A′0, and write

F ′ =∨∞n=−∞ T

′nA′0 and F ′0 =⋃∞n=0

∨ni=−n T

′iA′0.)

Definition 5.3.26. We say systems (Ω = X Z,F , P, T,G) and (Ω′ = X ′Z,F ′, P ′, T ′, G′) are

isomorphic if there exists a bijection φ : Ω→ Ω′ with the following properties:

(i) For A ⊆ Ω, we have φ(A) ∈ F ′ if and only if A ∈ F , in which case P (A) = P ′(φ(A));

(ii) φ(Tω) = T ′φ(ω) for all ω ∈ Ω;

(iii) For all ω, ω ∈ Ω, it holds that ω ∼ ω in G if and only if φ(ω) ∼ φ(ω) in G′;

(iv) For all B ⊆ Ω it holds that B ∈ F0 if and only if φ(B) ∈ F ′0.

In this case we write (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) and say that φ is an isomorphism.

The first two properties are the conditions for T ∼= T ′ as in Definition 5.1.8. Condition

(iii) means that φ preserves distinguishability in the sense that for A,B ∈ F it holds that


A × B ⊆ E(G) if and only if φ(A) × φ(B) ⊆ E(G′). Condition (iv) ensures that a finite

subalgebra B of F is contained in F0 if and only if φ(B) is contained in F ′0. This is necessary

to ensure the invariance of h(G,T ) under isomorphism, where we recall that h(G,T ) is a

supremum over the finite subalgebras of F0.

We consider a straightforward example. Suppose that (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′) under

an isomorphism φ in the sense of Definition 5.1.8 which satisfies (iv) in Definition 5.3.26. Then

given any shift invariant graph G on Ω, we can form G′ on Ω′ such that (Ω,F , P, T,G) ∼=

(Ω′,F ′, P ′, T ′, G′) simply by letting Definition 5.3.26 (iii) define G′. Defining G′ in this way

and using the shift invariance of G, the following straightforward chain of equivalences for

ω, ω ∈ Ω shows that the resulting graph G′ on Ω′ is shift invariant:

φ(ω) ∼ φ(ω) in G′ ⇐⇒ ω ∼ ω in G ⇐⇒ Tnω ∼ Tnω in G

⇐⇒ φ(Tnω) ∼ φ(Tnω) in G′ ⇐⇒ T ′nφ(ω) ∼ T ′nφ(ω) in G′,

where the last step used condition (ii) of Definition 5.1.8.

Lemma 5.3.27. As defined in Definition 5.3.26, isomorphism is an equivalence relation.

Proof. The argument given in [4, Remark 3, p.54] for isomorphisms as in Definition 5.1.8

extends straightforwardly to this situation. That isomorphism is symmetric and reflexive is

clear; it remains to prove transitivity.

Suppose that

(1) (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) under isomorphism φ : Ω→ Ω′, and

(2) (Ω′,F ′, P ′, T ′, G′) ∼= (Ω′′,F ′′, P ′′, T ′′, G′′) under isomorphism ψ : Ω′ → Ω′′.

We now show it follows that (Ω,F , P, T,G) ∼= (Ω′′,F ′′, P ′′, T ′′, G′′) under isomorphism

ψ φ : Ω → Ω′′ by demonstrating that the bijection ψ φ satisfies the conditions (i) to (iv)

of Definition 5.3.26. Observe the following:

(i) A ∈ F ⇐⇒ φ(A) ∈ F ′ ⇐⇒ ψ φ(A) ∈ F ′′, and when these hold, P (A) = P ′(φ(A)) =

P ′′(ψ φ(A));

(ii) ψ φ(Tω) = ψ(T ′φ(ω)) = T ′′ψ φ(ω) :

(iii) ω ∼ ω in G ⇐⇒ φ(ω) ∼ φ(ω) in G′ ⇐⇒ ψ φ(ω) ∼ ψ φ(ω) in G′′;

(iv) B ∈ F0 ⇐⇒ φ(B) ∈ F ′0 ⇐⇒ ψ φ(B) ∈ F ′′0 .

Just as h(T ) is invariant under isomorphisms as defined in Definition 5.1.8, the graph

entropy h(G,T ) is invariant under isomorphisms as defined in Definition 5.3.26.


Proposition 5.3.28. If (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) under isomorphism φ, then

h(G,T ) = h(G′, T ′).

Proof. We can refine the method in [20, Section 1.3, Theorem 14]. Consider an arbi-

trary finite subalgebra B of F which is contained in F0, and let at(B) = B1, . . . , Bm.

Then by property (iv) of Definition 5.3.26, B′ = φ(B) is a finite subalgebra of F ′ which

is contained in F ′0 with at(B′) = φ(B1), . . . , φ(Bm). The atoms of∨n−1k=0 T

−kB are the

non-empty sets⋂n−1k=0 T

−kBik , ik ∈ [m], and the atoms of∨n−1k=0 T

′−kB′ are the non-empty

sets⋂n−1k=0 T

′−kφ(Bik) = φ(⋂n−1k=0 T

−kBik), ik ∈ [m]. Thus φ puts the atoms of∨n−1k=0 T

−kB

and∨n−1k=0 T

′−kB′ into a one-to-one correspondence, and Definition 5.3.26 (iii) ensures that

G[∨n−1k=0 T

−kB] ∼= G′[∨n−1k=0 T

′−kB′]. Furthermore, by Definition 5.3.26 (i), corresponding

atoms of∨n−1k=0 T

−kB and∨n−1k=0 T

′−kB′ have equal measure under P and P ′ respectively.

It thus holds for all n ∈ N that

H

(G

[n−1∨k=0

T−kB

], P

)= H

(G′

[n−1∨k=0

T ′−kB′], P ′

),

and so

h(G[B], T ) = h(G′[B′], T ′).

We can conclude from Definition 5.3.18 that h(G′, T ′) ≥ h(G,T ). An equivalent argument

gives h(G,T ) ≥ h(G′, T ′), and the proof is complete.

Unlike the Kolmogorov–Sinai entropy h(T ), however, the graph entropy h(G,T ) is not a

complete invariant among the Bernoulli shifts.

Proposition 5.3.29. If systems (Ω,F , P, T,G) and (Ω′,F ′, P ′, T ′, G′) satisfy h(G,T ) =

h(G′, T ′) where T and T ′ are Bernoulli shifts, it does not follow that (Ω,F , P, T,G) ∼=

(Ω′,F ′, P ′, T ′, G′).

Proof. Recall from Proposition 5.3.23 that if T is the p-Bernoulli shift and G = GZ0 , then

h(G,T ) = H(G0, p). Our proposition is thus proved by finding a graph G0 with probability

distribution p on its vertex set X , and a graph H0 with probability distribution q on its vertex

set Y, such that H(G0, p) = H(H0, q), but such that

(Ω = X Z,F , P, T,G = GZ0 ) (Ω′ = YZ,F ′, P ′, T ′, G′ = HZ0 ), (5.22)


where T is the p-Bernoulli shift and T ′ the q-Bernoulli shift. Given that the conditions for

isomorphism in Definition 5.1.8 are also conditions for isomorphism in Definition 5.3.26, it is

clear that (5.22) holds when

(Ω,F , P, T ) (Ω′,F ′, P ′, T ′). (5.23)

Now (5.9) gives h(T ) = H(p) when T is the p-Bernoulli shift, and Theorem 5.1.10 shows

that h(T ) is a complete invariant among the Bernoulli shifts. Thus, if H(p) 6= H(q), we

can conclude that (5.23) and therefore (5.22) hold. So the proposition is proved if we can

find probabilistic graphs (G0, p) and (H0, q) satisfying H(G0, p) = H(H0, q) but such that

H(p) 6= H(q).

We give the following example. Consider the graph G0 = K2 below

•

•

and set p = (1/2, 1/2). Definition 1.1.1 and (1.21) give that H(G0, p) = H(p) = 1.

Then let H0 be the graph K2 ·K2, that is

• •

• •

and set q = (1/4, 1/4, 1/4, 1/4), giving that H(q) = 2. Probabilistic graph (H0, q) can be

formed from the probabilistic graph (K2, p) by substituting for each vertex of K2 a copy of

K2 with probability distribution p on its vertex set according to Lemma 5.2.7 with r = p×p,

which then gives H(H0, q) = H(K2, p) +H(K2, p) = 1.

Much of our work thus far has considered the graph G = GZ0 , where G0 is a fixed graph on

vertex set X . However, there exist many shift invariant graphs G not of this form, for example

the graph G = GZ,t0 as defined in Definition 5.3.10. Further work could be undertaken to

analyse such graphs; here we work towards one straightforward, but interesting, result.

Recall that if G0 is complete, (5.20) gives that h(GZ0 , T ) = h(T ). More generally we have

the following.

Lemma 5.3.30. If G0 is the complete graph, then for all t ∈ N we have h(GZ,t0 , T ) = h(T ).


Proof. Let G = GZ,t0 where G0 is the complete graph with vertex set X , and write An−1 =∨n−1k=0 T

−kA0. The atoms of An−1 are the cylinders ω : ω0 = i0, . . . , ωn−1 = in−1, ik ∈ X .

For A,B ∈ at(An−1) we have A ∼ B in graph G[An−1] if and only if A×B ⊆ E(G), that is

precisely when the cylinders A and B differ on at least t coordinates.

We proceed by finding an upper bound on α(G[An−1]) by counting, along with some fixed

atom A = ω : ω0 = i0, . . . , ωn−1 = in−1 ∈ at(An−1), the atoms of An−1 not adjacent to A

in G[An−1]. The number of atoms of An−1 which differ from A on exactly r coordinates is(nr

)(|X | − 1)r. With sufficiently large n, we have

(nr

)≤(nt−1

)for all r = 0, . . . , t− 1, and

α(G[An−1]) ≤t−1∑r=0

(n

r

)(|X | − 1)r ≤ t

(n

t− 1

)(|X | − 1)t−1

≤ tnt−1(|X | − 1)t−1,

giving

logα(G[An−1]) ≤ log t+ (t− 1) log n+ (t− 1) log(|X | − 1).

It then holds that

limn→∞

(logα(G[An−1])

n

)= 0. (5.24)

We now apply Corollary 1.3.14 and use Remark 5.1.2 (i) to give

H(G[An−1], P ) ≥ H(An−1)− logα(G[An−1]). (5.25)

Using (5.24) and (5.25) together with Propositions 5.3.20 and 5.3.15 and (5.7) gives

h(G,T ) = h(G[A0], T ) = lim supn→∞

1

nH(G[An−1], P ) ≥ lim

n→∞

1

nH(An−1) = h(T ).

The proof is completed by recalling from (5.19) on page 197 that h(G,T ) ≤ h(T ).

5.4 Further questions

The theory presented in this chapter raises a number of open questions which we now discuss:

each of them merits further work.


5.4.1 Finite subalgebras

Definition 5.3.18 and our attempt to generalise the Kolmogorov–Sinai Theorem to the setting

of partial distinguishability raise the following important open question:

Question 5.4.1. In general do we have equality in the result

h(G,T ) = supB⊂F0

h(G[B], T ) ≤ supB⊂Fh(G[B], T ),

where the suprema are taken over the finite subalgebras of F contained in F0 and F respec-

tively?

Intuitively we might expect an affirmative answer, in which case Definition 5.3.18 could

be rewritten to express h(G,T ) as a supremum over all the finite subalgebras of F , in closer

analogy with the expression for h(T ) in Definition 5.1.1 (iv). An affirmative answer would

also allow condition (iv) in Definition 5.3.26 to be dropped.

5.4.2 Source coding with partial distinguishability

In what follows we discuss a type of regularity that may be possessed by the shift T known

as ergodicity. We follow definitions as in [4, Chapter 1]: in system (Ω,F , P, T ), a set A ⊆ Ω

is said to be invariant if P ((A\T−1A)∪ (T−1A\A)) = 0. The shift T is called ergodic if every

invariant set has measure 0 or 1.

We let G = GZ0 and pn be the probability distribution on X n such that

pn(i0, . . . , in−1) = P (ω : ω0 = i0, . . . , ωn−1 = in−1).

In Appendix C we show how the Shannon–McMillan–Breiman Theorem [4, Theorem 13.1]

and the asymptotic equipartition property [4, Theorem 13.2] imply that if T is ergodic and

0 < λ < 1, then

h(T ) = limn→∞

1

nlog (min|E| : E ⊆ X n, pn(E) ≥ 1− λ) . (5.26)

This generalises (1.1) on page 3 and solves the ‘source coding’ problem in this setting. That

h(T ) has such an interpretation is well-known [16, Section 4.5].

We denote the subgraph of Gn0 induced by E ⊆ V (G) by Gn0 (E). If G0 is complete, then


|E| = χ(Gn0 (E)). So when G0 is complete and T is ergodic, (5.20) and (5.26) yield

h(G,T ) = limn→∞

1

nlog (minχ(Gn0 (E)) : E ⊆ X n, pn(E) > 1− λ) (5.27)

for all λ ∈ (0, 1). This motivates the following question.

Question 5.4.2. For G = GZ0 , under what conditions does (5.27) hold?

Where (5.27) holds, h(G,T ) acquires a ‘source coding’ interpretation in the case of partial

distinguishability, like that possessed by H(G, p) in Definition 1.3.5. We saw that (5.27)

holds in the case that G is complete and T ergodic. For the p-Bernoulli shift, pn = pn, and

Proposition 5.3.23 and Definition 1.3.5 give

h(G,T ) = H(G0, p) = limn→∞

1

nlog (minχ(Gn0 (E)) : E ⊆ X n, pn(E) > 1− λ) ,

and again (5.27) holds. It is unclear if (5.27) holds outside of these two cases.

5.4.3 Distinguishability

Fundamental to our work on the source with partial distinguishability is Definition 5.3.1.

With this definition of the distinguishability of sets, we note that the transfer of just a single

element ω ∈ Ω from one atom of a finite subalgebra B to another can change the graph

G[B]. It is arguably desirable to introduce a definition of distinguishability that would leave

G[B] invariant under the transfer of any null set from one atom to another. This could be

achieved by refining Definition 5.3.1 to say that sets A,B ∈ F are distinguishable if and only

if P (A), P (B) > 0 and there exist sets A′, B′ ∈ F such that P (A\A′) = P (B\B′) = 0 and

A′ ×B′ ⊆ E(G).

However, such a refinement of the definition of distinguishability means that the distin-

guishability of sets in F will depend on the measure P ; it is not clear if this would be helpful.

As an example of the difficulties this refined definition would bring, consider a, b, c, d ∈ X

where a b in G0 and c ∼ d in G0. Let G = GZ0 , and recall in this case that it seems

natural to declare two cylinders distinguishable if and only if they are distinguishable on

at least one coordinate. Suppose that P (ω : ω0 = a, ω1 = c) = P (ω : ω0 = a) and

P (ω : ω0 = b, ω1 = d) = P (ω : ω0 = b). (Because T preserves measure, we would then

have that a and b are almost always followed by c and d respectively.) By our refined dis-

tinguishability definition we have ω : ω0 = a ∼ ω : ω0 = b, but these cylinders are not


distinguishable at any coordinate.

If we found a satisfactory definition of distinguishability that is invariant on transferring

null sets between atoms, it would also be appropriate to modify Definition 5.3.26 to define

the concept of ‘isomorphism modulo null sets’ as considered in [4, Chapter 2] for the complete

case. There we write (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′) if there is an isomorphism Ω\A→ Ω′\A′

in the sense of Definition 5.1.8 where A ⊂ Ω and A′ ⊂ Ω′ satisfy P (A) = P ′(A′) = 0.

5.4.4 Further generalisations

We recall that the study of dynamical systems is not unique to information theory. Suppose

for a general dynamical system (Ω,F , P, T ), in which Ω may not be of the form X Z, we have

a symmetric relation on Ω described by graph G. The definitions in this chapter lead to the

quantity h(G,T ), the graph entropy associated to the system (Ω,F , P, T,G); further work

could be undertaken to study the significance of this quantity in this more general context.

Finally, having generalised Korner’s graph entropy to non-commutative graphs in Chapter

4, and to the non-i.i.d. classical case in this chapter, it would be natural to ask if it can be

generalised to the non-i.i.d. quantum case. Given the complexities involved in generalising

even the Kolmogorov–Sinai entropy to this setting, this is likely to be a difficult problem.

Appendix A

Convexity and semi-continuity

Here we gather together some standard definitions and results concerning convexity and

semi-continuity.

Definition A.0.1. When T is a vector space, set S ⊆ T is convex if

αu+ (1− α)v ∈ S for all u, v ∈ S and α ∈ [0, 1].

The convex hull of set S is denoted conv(S) and given by the intersection of all convex

sets containing S, in other words, the smallest convex set containing S. Equivalently, conv(S)

is the set of all finite convex combinations of elements of S, that is

conv(S) =

k∑i=1

λisi : k ∈ N, λi ∈ R+,k∑i=1

λi = 1, si ∈ S

.

If set A is convex, then p ∈ A is called an extreme point of A when there do not exist

distinct points q, r ∈ A satisfying p = tq + (1 − t)r for some t ∈ (0, 1). The following is a

standard result due to Minkowski.

Theorem A.0.2. [54, Theorem 1.10] If K is a compact and convex subset of a finite dimen-

sional vector space, then K is the convex hull of its extreme points.

Definition A.0.3. When X is a convex subset of a vector space, function f : X → R is

called concave if for x, y ∈ X and α ∈ [0, 1]

f(αx+ (1− α)y) ≥ αf(x) + (1− α)f(y). (A.1)

207

208

The function f is strictly concave if the inequality (A.1) is strict when α ∈ (0, 1) and x 6= y.

Function f is called (strictly) convex when −f is (strictly) concave.

We denote by R the extended real system given by R = R ∪ −∞,∞.

Definition A.0.4. In metric space X, the function f : X → R is lower semi-continuous at

x0 ∈ X if lim infx→x0 f(x) ≥ f(x0). The function f is lower semi-continuous if it is lower

semi-continuous at every point x ∈ X.

(If f(x0) =∞, lower semi-continuity at x0 requires limx→x0 f(x) =∞.)

Analogous to Definition A.0.4 is the following.

Definition A.0.5. For metric space X, the function f : X → R is upper semi-continuous at

x0 ∈ X if lim supx→x0f(x) ≤ f(x0). The function f is upper semi-continuous if it is upper

semi-continuous at every point x ∈ X.

(Equivalently, f is upper semi-continuous at x0 when lim supn→∞ f(xn) ≤ f(x0) for every

sequence (xn)x∈N in X converging to x0 ∈ X; an analogous statement can be made in the

case of lower semi-continuity.)

We also note that versions of Definitions A.0.4 and A.0.5 apply more generally to topo-

logical spaces, but the forms as given suffice for the work here.

It is clear that a function is continuous if and only if it is both lower semi-continuous

and upper semi-continuous. We now give an important result concerning upper or lower

semi-continuous functions acting on compact spaces.

Theorem A.0.6. [1, Theorem 2.40] (Extreme value theorem.) Let X be compact.

(i) A lower semi-continuous function f : X → R ∪ ∞ is lower bounded and attains its

infimum. (If f(x) =∞ for all x ∈ X, then we say infx∈X f(x) =∞, which is attained

at all x ∈ X.)

(ii) An upper semi-continuous function f : X → R ∪ −∞ is upper bounded and attains

its supremum.

Theorem A.0.7. For any function f : X × Y → R

infx∈X

supy∈Y

f(x, y) ≥ supy∈Y

infx∈X

f(x, y). (A.2)

209

Proof. We denote the left hand side of (A.2) by L and the right hand side by R. First we

consider the case −∞ < R < ∞. In this case, for any ε > 0, observe that R − ε is not an

upper bound on the set infx∈X f(x, y) : y ∈ Y , and hence there exists yε ∈ Y such that

infx∈X f(x, yε) > R− ε. Then f(x, yε) > R− ε for all x ∈ X, and so for every x ∈ X we have

supy∈Y f(x, y) > R− ε. Letting ε→ 0 yields L ≥ R. If R = −∞, the result is trivial. Finally,

if R = ∞, then for arbitrarily large λ > 0 there exists yλ ∈ Y such that f(x, yλ) ≥ λ for all

x ∈ X. Then for all x ∈ X we have supy∈Y f(x, y) ≥ λ and hence L ≥ λ. Letting λ → ∞

gives L =∞.

Here we state and prove the form of the minimax theorem used in this work. The state-

ment and proof are as based on [38], but we generalise to functions with codomain R ∪ ∞

as indeed was suggested in [38].

Theorem A.0.8. Let K be a convex, compact subset of a normed vector space X, and let C

be a convex subset of vector space Y . Let function f : K × C → R ∪ ∞ satisfy:

(i) x→ f(x, y) is convex and lower semi-continuous for each y ∈ C, and

(ii) y → f(x, y) is concave for each x ∈ K.

Then

infx∈K

supy∈C

f(x, y) = supy∈C

infx∈K

f(x, y). (A.3)

Proof. We denote the left hand side of (A.3) by L and the right hand side by R.

By Theorem A.0.7 it holds that L ≥ R.

We now want to show that L ≤ R, or equivalently that for all M ≥ R and for all ε > 0,

we have L ≤M + ε. From the right hand side of (A.3) we see that, for M ≥ R,

infx∈K

f(x, y) ≤M for all y ∈ C. (A.4)

For y ∈ C, let Ky,t = x ∈ K : f(x, y) ≤ t. Then for all ε > 0,

Ky,M+ε 6= ∅ for all y ∈ C. (A.5)

Let f(x, y) = fy(x). We have Ky,t = f−1y (s : s ≤ t) and so Ky,t is the preimage of the

half line (−∞, t] under a lower semi-continuous function and is thus closed. (For this see,

210

for instance, [26, Theorem 7.1.1(iii)].) But K is compact and Ky,t ⊆ K, giving that Ky,t is

compact.

It is also true that Ky,t is convex. To see this, let v, w ∈ Ky,t and γ ∈ [0, 1]. By the

convexity of fy we have

fy(γv + (1− γ)w) ≤ γfy(v) + (1− γ)fy(w) ≤ t

giving that γv + (1− γ)w ∈ Ky,t.

We want to show for all M ≥ R and ε > 0 that

⋂y∈C

Ky,M+ε 6= ∅, (A.6)

for then there will exist x0 ∈ K such that f(x0, y) ≤ M + ε for all y ∈ C, which yields

supy∈C f(x0, y) ≤M + ε. This gives L ≤M + ε, and thus L ≤ R as required.

With M ≥ R and ε > 0, we replace f by f − (M + ε). Then (A.4) and (A.5) give that

for all y ∈ C we have

infx∈K

f(x, y) ≤ −ε and Ky,0 6= ∅. (A.7)

So it is now sufficient (see (A.6)) to show that

⋂y∈C

Ky,0 6= ∅. (A.8)

Recall that a collection of sets is said to have the finite intersection property if all finite

intersections of its members are non-empty. It is a standard result [43, Theorem 2.36] that if

a space X is compact, then every collection of closed sets in X having the finite intersection

property has non-empty intersection. Since K is compact and each Ky,t is closed, (A.8) will

follow if we can show that

⋂y∈C0

Ky,0 6= ∅ for all finite C0 ⊆ C. (A.9)

We begin by showing that

Ky1,0 ∩Ky2,0 6= ∅ for all y1, y2 ∈ C, (A.10)

and will then proceed by induction. For i = 1, 2 we write Ki = Kyi,0 and fi(x) = f(x, yi).

211

Suppose towards a contradiction that K1 ∩ K2 = ∅. We show that this means there exists

α ∈ [0, 1] such that

(1− α)f1(x) + αf2(x) ≥ 0 for all x ∈ K, (A.11)

whence the concavity of the function y → f(x, y) gives f(x, (1−α)y1 +αy2) ≥ 0 for all x ∈ K.

Since (1− α)y1 + αy2 ∈ C by the convexity of C, this contradicts (A.7).

Now (A.11) holds trivially for all α ∈ [0, 1] when x /∈ K1 ∪K2 for then f1(x), f2(x) > 0.

Supposing K1 ∩K2 = ∅, then for all x1 ∈ K1 we have f1(x1) ≤ 0 and f2(x1) > 0. Similarly,

for all x2 ∈ K2 we have f1(x2) > 0 and f2(x2) ≤ 0. For (A.11) to hold for all x1 ∈ K1, we

require α(f2(x1)− f1(x1)) ≥ −f1(x1) for all x1 ∈ K1, that is, we require

α ≥ sup

−f1(x1)

f2(x1)− f1(x1): x1 ∈ K1

. (A.12)

We note this supremum is non-negative. Similarly, for (A.11) to hold for all x2 ∈ K2, we

require α(f1(x2)− f2(x2)) ≤ f1(x2) for all x2 ∈ K2, that is, we require

α ≤ inf

f1(x2)

f1(x2)− f2(x2): x2 ∈ K2

. (A.13)

We note this infimum is less than or equal to 1. We can thus find α ∈ [0, 1] to satisfy (A.12)

and (A.13) if and only if for all x1 ∈ K1 and for all x2 ∈ K2, we have

−f1(x1)

f2(x1)− f1(x1)≤ f1(x2)

f1(x2)− f2(x2). (A.14)

Observe that if f2(x1) = ∞ or if f1(x2) = ∞, then (A.14) holds immediately. Otherwise we

need

f1(x1)f2(x2) ≤ f1(x2)f2(x1) (A.15)

for all x1 ∈ K1 and for all x2 ∈ K2. This is trivial if f1(x1) = 0 or f2(x2) = 0.

Otherwise let θ = −f1(x1)f1(x2)−f1(x1) . We have 0 < θ < 1 and

(1− θ)f1(x1) + θf1(x2) = 0, (A.16)

givingθ

1− θ=−f1(x1)

f1(x2). (A.17)

By the convexity of f1, (A.16) gives that f1((1−θ)x1+θx2) ≤ 0 and so (1−θ)x1+θx2 ∈ K1.

We also have 0 < f2((1 − θ)x1 + θx2) ≤ (1 − θ)f2(x1) + θf2(x2), where the first inequality

212

follows from the assumption K1 ∩K2 = ∅ (that is (1− θ)x1 + θx2 /∈ K2) and the second from

the convexity of f2. This leads to

θ

1− θ<

f2(x1)

−f2(x2). (A.18)

Then (A.17) and (A.18) lead to f1(x1)f2(x2) < f1(x2)f2(x1), and (A.15) holds for all

x1 ∈ K1 and x2 ∈ K2, whence (A.11) holds, leading to the contradiction described. Thus

K1 ∩K2 6= ∅.

We now show that⋂i≤mKi 6= ∅ for m ∈ N. Let K ′i = Kyi,0∩Ky1,0 for i = 2, . . . ,m. Note

that K ′i 6= ∅ by the previous argument. Now take the restriction of f to Ky1,0×C. We recall

Ky1,0 is compact and convex, so we can apply the entire previous argument to K ′1 = Ky1,0 in

place of K to obtain

K ′2 ∩K ′3 = Ky1,0 ∩Ky2,0 ∩Ky3,0 6= ∅.

After m−1 repetitions we reach⋂i≤mKi 6= ∅, establishing (A.9) as required to complete the

proof.

Appendix B

Linear algebra

The set of m×n matrices with entries in S is denoted by Mm,n(S). We write Mm,n = Mm,n(C)

and Md = Md,d. By A = (aij) ∈ Mm,n we mean the element of Mm,n whose (i, j)-entry is

aij ∈ C for i ∈ [m] and j ∈ [n]. The trace of matrix M = (mij) ∈ Md is given by

TrM =∑d

i=1mii. For A ∈ Mn,m and B ∈ Mm,n, the important cyclicality condition

Tr(AB) = Tr(BA) holds. For A = (aij) ∈ Mm,n, the matrix B = (bij) ∈ Mn,m where

bij = aji is called the transpose of A and is denoted by B = At. Similarly, the matrix

C = (cij) ∈ Mn,m where cij = aji is called the Hermitian transpose of A and is denoted by

C = A∗. For matrices A ∈ Mm,n and B ∈ Mn,p, it holds that (AB)∗ = B∗A∗. A self-adjoint

or Hermitian matrix A satisfies A = A∗. The identity matrix in Md will be denoted by Id, or

often just I where context allows. In Md the zero matrix will be denoted by 0 and the all ones

d× d matrix by Jd or just J . A unitary matrix U satisfies UU∗ = U∗U = I. If u1, . . . , ud

and v1, . . . , vd are orthonormal bases of Cd, then there exists a unitary matrix U such that

ui = Uvi for i = 1, . . . , d. If v1, . . . , vd is an orthonormal basis of Cd, then Id =∑d

i=1 viv∗i .

We take inner products to be linear in their first argument and conjugate linear in the

second. Specifically, for u, v ∈ Cd we take 〈u, v〉 = v∗u =∑d

i=1 viui = Tr(uv∗), and the

associated norm is given by ‖u‖ =√〈u, u〉. For A ∈ Md and u, v ∈ Cd it holds that

〈u,Av〉 = 〈A∗u, v〉 . A complex vector space with an inner product and which is complete

with respect to the norm induced by the inner product is called a Hilbert space. We will only

consider finite dimensional Hilbert spaces. Every Hilbert space has an orthonormal basis,

and for a Hilbert space H of dimension d there exists an isometric isomorphism from H

onto Cd which preserves the inner product. In this sense Cd is essentially the only Hilbert

space of dimension d. If H is a Hilbert space with subspace W ⊆ H, then we denote the

dimension of W by dim(W), and the orthogonal complement of W is given by W⊥ = v ∈

213

214

H : 〈v, w〉 = 0 for all w ∈ W. It is well known that a subspace W of a finite dimensional

Hilbert space H satisfies W⊥⊥ = W, and dim(W) + dim(W⊥) = dim(H). The space Md is

a Hilbert space, and for M = (mij), N = (nij) with M,N ∈ Md we will use the Hilbert–

Schmidt inner product 〈M,N〉 =∑d

i,j=1mijnij = Tr(MN∗). Indeed, for any n,m ∈ N and

P,Q ∈ Mn,m we define 〈P,Q〉 = Tr(PQ∗). The associated Hilbert–Schmidt norm will be

denoted ‖M‖2 and is given by ‖M‖2 =√〈M,M〉. We write ‖M‖ for the operator norm,

given by ‖M‖ = sup‖Mv‖ : v ∈ Cd, ‖v‖ = 1. As Md is finite dimensional, standard theory

states that these norms are equivalent in the sense that there exist positive reals c and C

such that c‖M‖2 ≤ ‖M‖ ≤ C‖M‖2 for all M ∈Md.

It is straightforward to see that the following hold for a, b, u, v ∈ Cd and A,B ∈Md :

〈Au,Bv〉 =(Bv)∗(Au) = 〈B∗Au, v〉 = 〈B∗A, vu∗〉

〈au∗, bv∗〉 = Tr(au∗vb∗) = 〈v, u〉〈a, b〉

〈uv∗, A〉 = 〈u,Av〉 .

Definition B.0.1. Matrix M ∈Md is positive semi-definite if M = M∗ and 〈v,Mv〉 ≥ 0 for

all v ∈ Cd, and in this case we write M ≥ 0. This is equivalent to the condition that M is

Hermitian and has non-negative eigenvalues. We say a Hermitian matrix M ∈Md is positive

definite or strictly positive, and we write M > 0, if 〈v,Mv〉 > 0 for all non-zero v ∈ Cd, or

equivalently, if M has strictly positive eigenvalues.

If set S is a subset of a vector space V, then we recall that the span of S, denoted span(S),

is the set of all finite linear combinations of elements of S. Let M+d and M++

d denote the set

of all positive semi-definite d × d matrices and the set of all strictly positive d × d matrices

respectively. We write A ≥ B to mean A−B ≥ 0. Similarly, A > B means A−B > 0. Let

Mhd denote the set of Hermitian d× d matrices. For M ∈Md we define the range of M by

ran(M) = v ∈ Cd : there exists u ∈ Cd such that Mu = v,

and the kernel of M by

ker(M) = v ∈ Cd : Mv = 0.

If M ∈ Md and non-zero v ∈ Cd satisfies Mv = λv, then v is an eigenvector of M with

eigenvalue λ. If M ∈Mhd , then M can be expressed as M =

∑di=1 λiviv

∗i , where v1, . . . , vd is

an orthonormal basis of Cd, and vi is an eigenvector of M with eigenvalue λi ∈ R, i = 1, . . . , d.

It is then clear that TrM =∑d

i=1 λi. It may be that λi = 0 for some i ∈ [d]. If N ∈Mhd can

215

be expressed as N =∑k

i=1 λiviv∗i where v1, . . . , vk is an orthonormal set and each λi 6= 0,

then the rank of N is given by rank(N) = k. Note that the range of N is then given by

ran(N) = span(v1, . . . , vk), and so rank(N) = dim(ran(N)).

The largest eigenvalue of M ∈M+d is given by

‖M‖ = max〈v,Mv〉 : v ∈ Cd, ‖v|| = 1 = max〈M,ρ〉 : ρ ∈M+d , Tr ρ = 1.

Also note that for M =∑d

i=1 λiviv∗i where v1, . . . , vd is orthonormal, we have ‖M‖2 =√∑d

i=1 λ2i ≥ ‖M‖. Matrix M ∈ Md is normal if and only if MM∗ = M∗M . A matrix

M ∈Md is normal if and only if M can be written as M =∑d

i=1miviv∗i where v1, . . . , vd is

an orthonormal basis of Cd with eigenvalues m1, . . . ,md ∈ C ([19, Theorem 2.5.3]). Matrices

A,B ∈Md commute if AB = BA. A set S of matrices is said to be commutative if AB = BA

for all A,B ∈ S. If A,B ∈ Md can be expressed as A =∑d

i=1 aiviv∗i and B =

∑di=1 biviv

∗i

where v1, . . . , vd is an orthonormal basis of Cd and ai, bi ∈ C, then there is a unitary

matrix U such that UAU∗ and UBU∗ are diagonal matrices, and we say that A and B are

simultaneously unitarily diagonalisable. It is an important but standard result ([19, Theorem

2.5.5]) that a set of normal matrices is commutative if and only if the matrices in the set are

simultaneously unitarily diagonalisable.

Lemma B.0.2. The following are standard results in linear algebra.

For A,B,C ∈Md:

(i) A ≥ 0⇒ TrA ≥ 0.

(ii) A,B ≥ 0⇒ A+B ≥ 0 and λA ≥ 0 for λ ∈ R+.

(iii) A ≤ B and B ≤ C implies A ≤ C.

(iv) A,B ≥ 0⇒ Tr(AB) ≥ 0 (but note AB /∈M+d in general).

(v) If 0 ≤ A ≤ C and B ≥ 0, then Tr(AB) ≤ Tr(CB).

(vi) M++d ⊂M+

d ⊂Mhd .

(vii) If A ∈M+d , there exists A1/2 ∈M+

d such that A = (A1/2)2.

(viii) If V is an orthonormal basis for Cd, then TrA =∑

v∈V 〈Av, v〉.

(ix) For A ∈M+d and k ∈ R+ it holds that A ≤ kId ⇐⇒ ‖A‖ ≤ k.

(x) If A,B ∈M+d , then Tr(AB) = 0 ⇐⇒ AB = 0.

216

If v1, . . . , vn ∈ Cd are orthonormal vectors, then P =∑n

i=1 viv∗i is a rank-n orthogonal

projection and satisfies P 2 = P = P ∗ ≥ 0 and TrP = n. (In this thesis, the term projection

will be used to mean orthogonal projection.) Forming additional vectors vn+1, . . . , vd such

that v1, . . . , vd is an orthonormal basis of Cd, it is easy to see that if v =∑d

i=1 αivi with

αi ∈ C, then Pv =∑n

i=1 αivi, that is, Pv is the projection of v onto span(v1, . . . , vn) =

ran(P ).

If projections P1, . . . , Pk ∈Md satisfy∑k

i=1 Pi = Id, then it holds that PiPj = Tr(PiPj) =

0 for distinct i, j ∈ [k], yielding

ran(Pi) ⊥ ran(Pj) for distinct i, j ∈ [k]. (B.1)

If vi : i ∈ [d] is an orthonormal basis of Cd, then viv∗j : i, j ∈ [d] is an orthonormal

basis of Md. For M =∑d

i,j=1mijviv∗j ∈Md, we have mij = 〈Mvj , vi〉 . Choosing vi = ei gives

the canonical basis Eij : i, j ∈ [d] for Md, where Eij denotes the matrix unit eie∗j .

We use the Kronecker delta δij for i, j ∈ N given by

δij =

0 if i 6= j

1 if i = j.

An important operation we must consider is the tensor product.

Definition B.0.3. (i) The tensor product of vectors u = (ui)i∈[m] ∈ Cm and v = (vi)i∈[n] ∈

Cn is given by

u⊗ v = (uiv)i∈[m] ∈Mm,1(Cn) ∼= Cnm,

and for a, c ∈ Cm and b, d ∈ Cn it holds that 〈(a⊗ b), (c⊗ d)〉 = 〈a, c〉〈b, d〉 .

(ii) The tensor product of matrices A = (aij)i,j∈[m] ∈ Mm and B = (bij)i,j∈[n] ∈ Mn is

given by

A⊗B = (aijB)i,j∈[m] ∈Mm(Mn) ∼= Mnm, (B.2)

and

Tr(A⊗B) = TrA.TrB, and ‖A⊗B‖ = ‖A‖‖B||.

(Though we will normally be working with square matrices, (B.2) extends trivially to

define tensor products of non-square matrices.)

217

(iii) The tensor product of vector spaces U and V is the vector space given by

U ⊗ V = spanu⊗ v : u ∈ U, v ∈ V .

It is useful to note the following results for matrices A,B,C,D and k ∈ C.

1. Bilinearity: If B + C exists then

A⊗ (B + C) = A⊗B +A⊗ C, (B + C)⊗A = B ⊗A+ C ⊗A,

and

(kA)⊗B = A⊗ (kB) = k(A⊗B).

2. Associativity:

(A⊗B)⊗ C = A⊗ (B ⊗ C).

3. Mixed product property: If the matrix products AC and BD exist, then

(A⊗B)(C ⊗D) = (AC)⊗ (BD).

4. Adjoint property:

(A⊗B)∗ = A∗ ⊗B∗.

For A,B ⊆ V where V is a vector space, we let A+B = a+b : a ∈ A, b ∈ B. For subspaces

A,B ⊆Md we have

(A+ B)⊥ = A⊥ ∩ B⊥. (B.3)

For subspaces Si ⊆Mdi , i = 1, 2, it is clear that

(S1 ⊗ S2)⊥ = S⊥1 ⊗Md2 +Md1 ⊗ S⊥2 . (B.4)

Appendix C

Source coding for the ergodic

source

Here we recall two related theorems for the ergodic source and show that they lead to (5.26).

We work in dynamical system (Ω = X Z,F , P, T ). We let the probability distribution pn on

X n be given by

pn(i0, . . . , in−1) = P (ω : ω0 = i0, . . . , ωn−1 = in−1).

Theorem C.0.1 (Shannon–McMillan–Breiman). [4, Theorem 13.1] For ω ∈ Ω we write

ω = (ωi)i∈Z. If T is an ergodic shift, then

limn→∞

− 1

nlog pn(ω0, . . . , ωn−1) = h(T ) almost everywhere on Ω.

Theorem C.0.2 (Asymptotic equipartition property). [4, Theorem 13.2] Suppose that T is

an ergodic shift and let h(T ) = h. Then for any ε > 0 there exists n0(ε) ∈ N such that for all

integers n ≥ n0(ε) there is a set B(n, ε) ⊆ X n satisfying pn(B(n, ε)) ≥ 1− ε, and such that

2−n(h+ε) < pn(u) < 2−n(h−ε)

for all u ∈ B(n, ε). Indeed, this can be achieved by setting

B(n, ε) =

u ∈ X n :

∣∣∣∣− 1

nlog pn(u)− h

∣∣∣∣ < ε

. (C.1)

From Theorems C.0.1 and C.0.2 we now prove (5.26), restated below.

218

219

Theorem C.0.3. If T is an ergodic shift, then for all ε ∈ (0, 1),

limn→∞

1

nlog (min|A| : A ⊆ X n, pn(A) ≥ 1− ε) = h(T ).

Proof. Write h(T ) = h. Choose δ ∈ (0, ε) and form the set B(n, δ) as in (C.1) so that for

sufficiently large n and for all u ∈ B(n, δ) we have pn(u) > 2−n(h+δ) and

pn(B(n, δ)) > 1− δ > 1− ε.

This gives |B(n, δ)| < 2n(h+δ), and it follows that

lim sup1

nlog (min|A| : A ∈ X n, pn(A) > 1− ε) ≤ lim sup

1

nlog |B(n, δ)|

< h+ δ. (C.2)

Now take A ⊆ X n satisfying pn(A) ≥ 1− ε. Then

pn(A ∩ B(n, δ)) = 1− pn(Ac ∪ B(n, δ)c) ≥ 1− pn(Ac)− pn(B(n, δ)c). (C.3)

It is a standard result for finite measure spaces that convergence almost everywhere implies

convergence in measure [44, p. 74], and so Theorem C.0.1 implies that there exists N ∈ N

such that for all n > N

P

(ω :

∣∣∣∣− 1

nlog pn(ω0 . . . , ωn−1)− h

∣∣∣∣ ≥ δ) < δ,

that is pn(B(n, δ)c) < δ.

Then for n > N , (C.3) gives that

pn(A ∩ B(n, δ)) > 1− ε− δ. (C.4)

and so

2n(h−δ)pn(A ∩ B(n, δ)) > 2n(h−δ)(1− ε− δ). (C.5)

Denoting the left hand side of (C.5) by L, we have

L = 2n(h−δ)∑

u∈A∩B(n,δ)

pn(u)

and since pn(u) < 2−n(h−δ) for all u ∈ B(n, δ) for sufficiently large n, it follows that there

220

exists n0 ∈ N such that L < |A ∩ B(n, δ)| ≤ |A| for all n ≥ n0. Returning to (C.5), we

conclude that, for n ≥ n0, any such set A satisfies |A| > 2n(h−δ)(1− ε− δ). This gives that

lim inf1

nlog (min|A| : A ∈ X n, pn(A) > 1− ε) > lim inf

1

nlog(

2n(h−δ)(1− ε− δ))

= h− δ. (C.6)

Letting δ → 0 in (C.6) and (C.2) gives the result.

Bibliography

[1] Charalambos D. Aliprantis and Kim C. Border. Infinite-dimensional analysis. Springer-

Verlag, Berlin, second edition, 1999. A hitchhiker’s guide.

[2] William B. Arveson. Subalgebras of C∗-algebras. Acta Math., 123:141–224, 1969.

[3] Koenraad M. R. Audenaert and Jens Eisert. Continuity bounds on the quantum relative

entropy. J. Math. Phys., 46(10):102104, 21, 2005.

[4] Patrick Billingsley. Ergodic theory and information. John Wiley & Sons, Inc., New

York-London-Sydney, 1965.

[5] Gareth Boreland. A lower bound on graph entropy. Math. Proc. R. Ir. Acad., 118A(1):9–

20, 2018.

[6] Gareth Boreland, Ivan G. Todorov, and Andreas Winter. Sandwich theorems and ca-

pacity bounds for non-commutative graphs. arXiv:1907.11504, 2019.

[7] Jean Cardinal, Samuel Fiorini, and Gwenael Joret. Minimum entropy coloring. In

Algorithms and computation, volume 3827 of Lecture Notes in Comput. Sci., pages 819–

828. Springer, Berlin, 2005.

[8] Man Duen Choi. Completely positive linear maps on complex matrices. Linear Algebra

and Appl., 10:285–290, 1975.

[9] V. Chvatal. On certain polytopes associated with graphs. J. Comb. Theory B, 18:138–

154, 1975.

[10] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-

Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2006.

[11] I. Csiszar, J. Korner, L. Lovasz, K. Marton, and G. Simonyi. Entropy splitting for

antiblocking corners and perfect graphs. Combinatorica, 10(1):27–40, 1990.

221

BIBLIOGRAPHY 222

[12] R. Duan. Super-activation of zero error capacity of noisy quantum channels.

arXiv:0906.2527, 2009.

[13] Runyao Duan, Simone Severini, and Andreas Winter. Zero-error communication via

quantum channels, noncommutative graphs, and a quantum Lovasz number. IEEE

Trans. Inform. Theory, 59(2):1164–1174, 2013.

[14] Dennis Geller and Saul Stahl. The chromatic number and other functions of the lexico-

graphic product. J. Combinatorial Theory Ser. B, 19(1):87–95, 1975.

[15] Chris Godsil and Gordon Royle. Algebraic graph theory, volume 207 of Graduate Texts

in Mathematics. Springer-Verlag, New York, 2001.

[16] Robert M. Gray. Entropy and information theory. Springer, New York, second edition,

2011.

[17] M. Grotschel, L. Lovasz, and A. Schrijver. Relaxations of vertex packing. J. Combin.

Theory Ser. B, 40(3):330–343, 1986.

[18] Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric algorithms and

combinatorial optimization, volume 2 of Algorithms and Combinatorics: Study and Re-

search Texts. Springer-Verlag, Berlin, 1988.

[19] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press,

Cambridge, second edition, 2013.

[20] Yuichiro Kakihara. Abstract methods in information theory, volume 10 of Series on

Multivariate Analysis. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, second

edition, 2016.

[21] S. Kim and A. Metha. Chromatic numbers and a Lovasz type inequality for non-

commutative graphs. arXiv:1709.05595v1, 2017.

[22] Donald E. Knuth. The sandwich theorem. Electron. J. Combin., 1:Article 1, approx. 48,

1994.

[23] J. Korner. Coding of an information source having ambiguous alphabet and the en-

tropy of graphs, in Transactions of the Sixth Prague Conference on Information Theory,

Prague, 1971. pages 411–425, 1973.

[24] Janos Korner. Fredman-Komlos bounds and information theory. SIAM J. Algebraic

Discrete Methods, 7(4):560–570, 1986.

BIBLIOGRAPHY 223

[25] Janos Korner, Gabor Simonyi, and Zsolt Tuza. Perfect couples of graphs. Combinatorica,

12(2):179–192, 1992.

[26] Andrew J. Kurdila and Michael Zabarankin. Convex functional analysis. Systems &

Control: Foundations & Applications. Birkhauser Verlag, Basel, 2005.

[27] R. Levene, V. Paulsen, and I. Todorov. Complexity and capacity bounds for quantum

channels. arXiv:1710.06456v1, 2017.

[28] Laszlo Lovasz. On the Shannon capacity of a graph. IEEE Trans. Inform. Theory,

25(1):1–7, 1979.

[29] Katalin Marton. On the Shannon capacity of probabilistic graphs. J. Combin. Theory

Ser. B, 57(2):183–195, 1993.

[30] Robert J. McEliece and Edward C. Posner. Hide and seek, data storage, and entropy.

Ann. Math. Statist., 42:1706–1716, 1971.

[31] Robert E. Megginson. An introduction to Banach space theory, volume 183 of Graduate

Texts in Mathematics. Springer-Verlag, New York, 1998.

[32] Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum informa-

tion. Cambridge University Press, Cambridge, 2000.

[33] Donald Ornstein. Bernoulli shifts with the same entropy are isomorphic. Advances in

Math., 4:337–352, 1970.

[34] V. Paulsen. Matrix analysis, 2015. Lecture notes, University of Waterloo, available at

http://http://www.math.uwaterloo.ca/ vpaulsen/ matrixanal2-1.pdf, accessed

10-4-2019.

[35] V. Paulsen. Entanglement and non-locality, 2016. Lecture notes, Uni-

versity of Waterloo, available at http://www.math.uwaterloo.ca/∼vpaulsen/

EntanglementAndNonlocality LectureNotes 7.pdf, accessed 17-8-2018.

[36] Vern Paulsen. Completely bounded maps and operator algebras, volume 78 of Cambridge

Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002.

[37] Sven Polak and Alexander Schrijver. New lower bound on the shannon capacity of c7

from circular graphs. arXiv:1808.07438, 2018.

[38] D. Pollard. Minimax theorem, 2003. Available at http://www.stat.yale.edu/

∼pollard/Courses/602.spring07/MmaxThm.pdf, accessed 17-8-2018.

BIBLIOGRAPHY 224

[39] S. Rezaei. Entropy and graphs. Master of Math thesis, University of Waterloo,

arXiv:1311.5632, 2013.

[40] S. Rezaei and E. Chiniforooshan. Symmetric graphs with respect to graph entropy.

Electron. J. Combin., 24(1), 2017.

[41] Joseph V. Romanovsky. A simple proof of the Birkhoff-von Neumann theorem on bis-

tochastic matrices. In A tribute to Ilya Bakelman (College Station, TX, 1993), volume 3

of Discourses Math. Appl., pages 51–53. Texas A & M Univ., College Station, TX, 1994.

[42] Walter Rudin. Functional analysis. McGraw-Hill Book Co., New York-Dusseldorf-

Johannesburg, 1973. McGraw-Hill Series in Higher Mathematics.

[43] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New York-

Auckland-Dusseldorf, third edition, 1976. International Series in Pure and Applied Math-

ematics.

[44] Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third

edition, 1987.

[45] Mary Beth Ruskai. Inequalities for quantum entropy: a review with conditions for

equality. J. Math. Phys., 43(9):4358–4375, 2002. Quantum information theory.

[46] Edward R. Scheinerman and Daniel H. Ullman. Fractional graph theory. Dover Publi-

cations, Inc., Mineola, NY, 2011. A rational approach to the theory of graphs, With a

foreword by Claude Berge, Reprint of the 1997 original.

[47] C. E. Shannon. A mathematical theory of communication. Bell System Tech. J., 27:379–

423, 623–656, 1948.

[48] Claude E. Shannon. The zero error capacity of a noisy channel. Institute of Radio

Engineers, Transactions on Information Theory,, IT-2(September):8–19, 1956.

[49] Gabor Simonyi. Graph entropy: a survey. In Combinatorial optimization (New

Brunswick, NJ, 1992–1993), volume 20 of DIMACS Ser. Discrete Math. Theoret. Com-

put. Sci., pages 399–441. Amer. Math. Soc., Providence, RI, 1995.

[50] Gabor Simonyi. Perfect graphs and graph entropy. An updated survey. In Perfect graphs,

Wiley-Intersci. Ser. Discrete Math. Optim., pages 293–328. Wiley, Chichester, 2001.

[51] Dan Stahlke. Quantum zero-error source-channel coding and non-commutative graph

theory. IEEE Trans. Inform. Theory, 62(1):554–577, 2016.

BIBLIOGRAPHY 225

[52] M. Tribus and E.C. McIrvine. Energy and information. Scientific American, 224, 1971.

[53] Peter Vrana. Probabilistic refinement of the asymptotic spectrum of graphs.

arXiv:1903.01857, 2019.

[54] John Watrous. The theory of quantum information. Cambridge University Press, 2018.

[55] Nik Weaver. A “quantum” Ramsey theorem for operator systems. Proc. Amer. Math.

Soc., 145(11):4595–4605, 2017.

[56] Alfred Wehrl. General properties of entropy. Rev. Modern Phys., 50(2):221–260, 1978.

[57] Mark M. Wilde. Quantum information theory. Cambridge University Press, Cambridge,

second edition, 2017.

[58] H. S. Witsenhausen. The zero-error side information problem and chromatic numbers.

IEEE Trans. Information Theory, IT-22(5):592–593, 1976.

Index

σ-algebra, 175

abelian projection, 97

abelian projection convex corner, 98

adjoint channel, 108

anti-blocker, 10

atom, 176

automorphism, 14

Bernoulli shift, 179, 180, 198, 201

bipartite graph, 23

c.p.t.p. map, 87

channel, 82

Choi matrix, 88

chromatic number, 14, 96, 138

clique, 13, 97

clique covering number, 14, 127

clique number, 13, 126

clique projection, 97

clique projection convex corner, 98

co-normal product, 14, 139

co-tensor product, 140

complement, 13

complete bipartite graph, 27

complete graph, 14

completely positive, 87

concave function, 207

conditional entropy, 177

conditional graph entropy, 189

confusability graph, 82

convex corner, 4, 35

convex function, 208

convex hull, 207

convex set, 207

cycle, 23

cylinder, 175

density matrix, 58

diagonal convex corner, 35

distinguishability, 15

distinguishability graph, 15, 187

distinguishability preserving transformation,

187

dynamical system, 176

eigenvalue, 214

eigenvector, 214

empty graph, 14

entropy, 2, 177

entropy over a convex corner, 5

extreme point, 207

Fekete’s Lemma, 83

finite subalgebra, 176

fractional chromatic number, 15, 128

fractional clique covering number, 128, 129

fractional clique number, 18, 126

fractional full covering number, 128

fractional vertex packing polytope, 30

full covering number, 127

full number, 126

full projection, 97

full projection convex corner, 98

226

INDEX 227

full set, 97

graph colouring, 14

graph entropy, 13, 15, 187, 195

handled orthonormal labelling (h.o.n.l.), 30

handled projective orthogonal

labelling (h.p.o.l.), 115

hereditary, 4

hereditary cover, 41

Hermitian matrix, 213

Hilbert space, 58, 90, 213

Hilbert–Schmidt inner product, 33, 214

Hilbert–Schmidt norm, 33, 214

homomorphism, 14, 134

i.i.d., 2

identity channel, 82, 159

independence number, 13

independent projection, 138

independent set, 13, 97, 138

induced subgraph, 13

inner product, 6, 33, 214

isomorphism, 14, 180, 199

kernel, 14

Kolmogorov Existence Theorem, 176

Kolmogorov–Sinai entropy, 177, 180, 195

Kolmogorov–Sinai Theorem, 178, 196

Kraus operators, 88

Kraus representation, 88

Kronecker delta, 216

Korner, 15

lexicographic product, 182

logarithm, 1

logarithm function, 1, 59

Lovasz, 84

Lovasz corner, 109

Lovasz number, 31, 126, 146

lower semi-continuous function, 208

Markov shift, 179, 198

measurable function, 175

measurable space, 175

measure preserving transformation, 175

measurement system, 86

memory, 174

mixed state, 58, 59

noiseless channel, 2

noisy channel, 82

non-commutative graph, 90

non-commutative graph entropy, 142

norm, 6, 33, 214

normal matrix, 215

null set, 175

one-shot zero-error capacity, 83

operator anti-system, 137

operator norm, 33, 214

operator system, 90

orthogonal complement, 213

orthonormal labelling (o.n.l.), 30

packing number, 85

perfect graph, 23, 108

perfect matching, 27

positive definite matrix, 214

positive map, 87

positive semi-definite matrix, 214

probabilistic graph, 14

probability measure, 175

probability space, 175

projection, 216

INDEX 228

projective orthogonal labelling (p.o.l.), 115

pure state, 58, 59

quantum channel, 87

quantum mechanics, 58, 86, 87

quantum relative entropy, 61

quantum system, 58

reflexivity, 37

relative entropy, 3

sandwich theorem, 31, 99, 120

second anti-blocker theorem, 10, 48, 55

second Lovasz number, 147

self-adjoint matrix, 213

semi-continuity, 208

Shannon, 1

Shannon capacity, 83, 95

Shannon entropy, 2

shift, 175

shift invariant graph, 187

simultaneously unitarily diagonalisable

matrices, 215

source coding, 2, 15

spanning subgraph, 13

stable set, 13

standard convex corner, 4, 38

state, 58

state space, 58

strong chromatic number, 138

strong product, 83

strongly independent set, 138

substitution lemma, 183

symmetric, 23

tensor product, 216

tensor product of channels, 94

tensor product of graphs, 140

theta corner, 30

threshold-t co-normal product, 192

trace, 213

trace preserving, 87

trivial channel, 160

unit corner, 4, 69

unit cube, 4, 69

upper semi-continuous function, 208

vertex packing polytope, 16

vertex transitive, 14

von Neumann entropy, 60

weighted independence number, 137

weighted Lovasz number, 137

Witsenhausen rate, 85, 155

zero-error capacity, 83

zero-error information theory, 82

DOCTOR OF PHILOSOPHY Information theoretic parameters for ... · Information theoretic parameters...

Documents

Transcript of DOCTOR OF PHILOSOPHY Information theoretic parameters for ... · Information theoretic parameters...