The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The...

38
The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum Computation, Clarendon Laboratory, University of Oxford, OX1 3PU, United Kingdom (Published 8 March 2002) Quantum mechanics and information theory are among the most important scientific discoveries of the last century. Although these two areas initially developed separately, it has emerged that they are in fact intimately related. In this review the author shows how quantum information theory extends traditional information theory by exploring the limits imposed by quantum, rather than classical, mechanics on information storage and transmission. The derivation of many key results differentiates this review from the usual presentation in that they are shown to follow logically from one crucial property of relative entropy. Within the review, optimal bounds on the enhanced speed that quantum computers can achieve over their classical counterparts are outlined using information-theoretic arguments. In addition, important implications of quantum information theory for thermodynamics and quantum measurement are intermittently discussed. A number of simple examples and derivations, including quantum superdense coding, quantum teleportation, and Deutsch’s and Grover’s algorithms, are also included. CONTENTS I. Introduction 197 II. Relative Entropy 198 A. Statistical significance 199 1. Classical data compression 200 2. Sanov’s theorem 201 B. Other information measures from relative entropy 201 C. Classical evolution and relative entropy 202 D. Schmidt decomposition and quantum dynamics 204 E. Quantum relative entropy 206 III. Quantum Communication: Classical Use 209 A. The Holevo bound 210 B. Schumacher’s compression 212 C. Dense coding 213 D. Relative entropy, thermodynamics, and information erasure 215 IV. Quantum Communication: Quantum Use 216 A. Quantifying entanglement 216 B. Teleportation 219 C. Measures of entanglement from relative entropy 221 D. Classical information and quantum correlations 223 V. Quantum Computation 224 A. Deutsch’s algorithm 225 B. Computation: Communication in time 226 C. Black-box complexity 227 D. Database search 227 E. Quantum computation and quantum measurement 230 F. Ultimate limits of computation: The Bekenstein bound 231 VI. Conclusions 231 Acknowledgments 232 References 232 I. INTRODUCTION Quantum physics not only provides the most complete description of physical phenomena known to man, it also provides a new philosophical framework for our un- derstanding of nature. It enables us to accurately model systems ranging in size from quarks and atoms to large cosmic objects such as black holes. Information theory, on the other hand, teaches us about our physical ability to store and process information. Without a formalized information theory, many of the recent developments in telecommunications, computer science, and engineering would simply not have been possible. Although quan- tum physics and information theory initially developed separately, their recent integration is seen as yet another important step towards understanding the fundamental properties and limitations of Nature. One of the central information-theoretic concepts in science is that of distinguishability. Inevitably an ani- mal’s survival depends on its ability to distinguish a mate from a predator or prey. In the same way, physical ex- periments aim to be sensitive enough to be able to dis- tinguish one hypothesis from another. It is, however, no surprise that the influence of the concept of distinguish- ability is felt far beyond science. Life consists of a series of decisions that have to be made. This we do, con- sciously or unconsciously, by evaluating all the alterna- tives and distinguishing the consequences of various al- ternative actions. The purpose of this review is to show that the appar- ently simple concept of distinguishability is at the root of information processing. Ultimately how well we can dis- tinguish different physical states determines how much information we can encode into a certain system and how quickly we can manipulate it. Distinguishability in turn is completely dependent upon the laws of physics, and quantum physics naturally allows for more versatile information processing than does classical physics. The reasoning behind this is that unlike classical states, two *Permanent address; Blackett Laboratory, Imperial College, London SW7 2BZ. REVIEWS OF MODERN PHYSICS, VOLUME 74, JANUARY 2002 0034-6861/2002/74(1)/197(38)/$35.00 ©2002 The American Physical Society 197

Transcript of The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The...

Page 1: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

REVIEWS OF MODERN PHYSICS, VOLUME 74, JANUARY 2002

The role of relative entropy in quantum information theory

V. Vedral*

Centre for Quantum Computation, Clarendon Laboratory, University of Oxford, OX1 3PU,United Kingdom

(Published 8 March 2002)

Quantum mechanics and information theory are among the most important scientific discoveries ofthe last century. Although these two areas initially developed separately, it has emerged that they arein fact intimately related. In this review the author shows how quantum information theory extendstraditional information theory by exploring the limits imposed by quantum, rather than classical,mechanics on information storage and transmission. The derivation of many key results differentiatesthis review from the usual presentation in that they are shown to follow logically from one crucialproperty of relative entropy. Within the review, optimal bounds on the enhanced speed that quantumcomputers can achieve over their classical counterparts are outlined using information-theoreticarguments. In addition, important implications of quantum information theory for thermodynamicsand quantum measurement are intermittently discussed. A number of simple examples andderivations, including quantum superdense coding, quantum teleportation, and Deutsch’s andGrover’s algorithms, are also included.

CONTENTS

I. Introduction 197II. Relative Entropy 198

A. Statistical significance 1991. Classical data compression 2002. Sanov’s theorem 201

B. Other information measures from relativeentropy 201

C. Classical evolution and relative entropy 202D. Schmidt decomposition and quantum dynamics 204E. Quantum relative entropy 206

III. Quantum Communication: Classical Use 209A. The Holevo bound 210B. Schumacher’s compression 212C. Dense coding 213D. Relative entropy, thermodynamics, and

information erasure 215IV. Quantum Communication: Quantum Use 216

A. Quantifying entanglement 216B. Teleportation 219C. Measures of entanglement from relative entropy 221D. Classical information and quantum correlations 223

V. Quantum Computation 224A. Deutsch’s algorithm 225B. Computation: Communication in time 226C. Black-box complexity 227D. Database search 227E. Quantum computation and quantum

measurement 230F. Ultimate limits of computation: The Bekenstein

bound 231VI. Conclusions 231

Acknowledgments 232References 232

*Permanent address; Blackett Laboratory, Imperial College,London SW7 2BZ.

0034-6861/2002/74(1)/197(38)/$35.00 197

I. INTRODUCTION

Quantum physics not only provides the most completedescription of physical phenomena known to man, italso provides a new philosophical framework for our un-derstanding of nature. It enables us to accurately modelsystems ranging in size from quarks and atoms to largecosmic objects such as black holes. Information theory,on the other hand, teaches us about our physical abilityto store and process information. Without a formalizedinformation theory, many of the recent developments intelecommunications, computer science, and engineeringwould simply not have been possible. Although quan-tum physics and information theory initially developedseparately, their recent integration is seen as yet anotherimportant step towards understanding the fundamentalproperties and limitations of Nature.

One of the central information-theoretic concepts inscience is that of distinguishability. Inevitably an ani-mal’s survival depends on its ability to distinguish a matefrom a predator or prey. In the same way, physical ex-periments aim to be sensitive enough to be able to dis-tinguish one hypothesis from another. It is, however, nosurprise that the influence of the concept of distinguish-ability is felt far beyond science. Life consists of a seriesof decisions that have to be made. This we do, con-sciously or unconsciously, by evaluating all the alterna-tives and distinguishing the consequences of various al-ternative actions.

The purpose of this review is to show that the appar-ently simple concept of distinguishability is at the root ofinformation processing. Ultimately how well we can dis-tinguish different physical states determines how muchinformation we can encode into a certain system andhow quickly we can manipulate it. Distinguishability inturn is completely dependent upon the laws of physics,and quantum physics naturally allows for more versatileinformation processing than does classical physics. Thereasoning behind this is that unlike classical states, two

©2002 The American Physical Society

Page 2: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

198 V. Vedral: Relative entropy in quantum information theory

different quantum states are not necessarily fully distin-guishable. It is interesting to note that although this atfirst seems like a limitation, it in fact presents us withsignificantly more possibilities for information encodingand transmission.

In this review I first plan to argue that relative entropyis the most appropriate quantity for measuring distin-guishability between different states. The proper frame-work in which to talk about states is, of course, quantummechanics, so it is necessary to define quantum relativeentropy. I prove that relative entropy, both classical andquantum, does not increase with time. Thus two statescan only become less distinguishable as they undergoany kind of evolution. This result will be central to myreview, as subsequent results will follow from this simplefact.

I then go on to show that the ‘‘no increase of relativeentropy’’ principle tells us about the ability of quantumstates to store and process information. Information hasto be encoded and manipulated in physical systems.Therefore distinguishability of different states within aphysical system is a prerequisite. Looking at this fromthe point of view of communication, what does it meanto send and receive a message? Sending a message suc-cessfully means encoding the information we wish tosend into a structured format that the receiver must beable to distinguish unambiguously. Communication ca-pacity can then be thought of as the rate at which we cansend and receive messages. The rate of successful trans-mission is determined by the relative entropy betweenvarious encoding states.

What is less obvious, but nonetheless equally true, isthat computation can also be viewed as a special kind ofcommunication. This will allow the use of relative en-tropy to quantify the efficiency (i.e., speed) of quantumcomputation in general.

The role of measurement within quantum mechanicsand therefore information theory is paramount. Classi-cally the measurement process is implicit because physi-cal quantities have well-defined preexisting properties.For example, a classical bit is either in the state 0 or 1,whereas a quantum bit can exist in a combination of thetwo states. A measurement is necessary to ‘‘collapse’’this combination to a classical result that we can thenread. The very concept of measurement efficiency canalso be quantified using relative entropy. A measure-ment, like a communication process, creates correlationsbetween a system and an apparatus whose purpose is toreceive an amount of information from the system. Theopposite of this process, namely, the deletion of informa-tion, can be seen to be at the root of irreversibility, andthis invariably contributes to an increase in the entropyof the environment. This amount is exactly quantifiedusing the relative entropy between the environmentalstate and the apparatus state, and it provides an excitinglink between information theory, computation, thermo-dynamics, and quantum mechanics. However, before wereach this exciting stage, our long journey has to beginwith a much simpler question: how do we quantify un-certainty in a physical state?

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

II. RELATIVE ENTROPY

Fundamental to our understanding of distinguishabil-ity is the measure of uncertainty in a given probabilitydistribution. This uncertainty can be quantified by intro-ducing the idea of ‘‘surprise.’’ Suppose that a certainevent happens with a probability p . We would then liketo quantify how surprised we are when that event doeshappen. The first guess would be 1/p : the smaller theprobability of an event, the more surprised we are whenthe event happens, and vice versa. However, an eventmight be composed of two independent events that hap-pen with probabilities q and r , respectively, so that theprobability of both events occurring is p5q3r . Wewould now intuitively expect that the surprise of p is thesame as the surprise of q plus the surprise of r . But1/pÞ1/q11/r , so that 1/p is not really a satisfactory defi-nition from this perspective. Instead, if we define sur-prise as ln(1/p), then the above property called additiv-ity is satisfied, since 2ln p1p252ln p12ln p2 . With aprobability distribution (npn51, the total uncertainty isjust the average of all the surprises. Additivity of uncer-tainties of statistically independent events is such a strin-gent condition that it basically leads to a unique measure(Shannon and Weaver, 1949) up to a constant and loga-rithmic base.

Definition. The uncertainty in a collection of possiblestates ai with corresponding probability distributionp(ai) is given by its entropy,

S~p !ª2(i

p~ai!ln p~ai!, (1)

called the Shannon entropy. We note that there is noBoltzmann constant term in this expression as there isfor the physical entropy, since it is by convention set tounity. This measure is suitable for the states of systemsdescribed by the laws of classical physics, but it will haveto be changed, along with other classical measures, whenwe present the quantum information theory.

We ultimately wish to be able to talk about storingand processing information. For this we require a meansof comparing two different probability distributions,which is why I introduce the notion of relative entropy(first introduced by Kullback and Leibler, 1951). Sup-pose that a collection of events has the probability dis-tribution $pi%, but we mistakenly think that this prob-ability distribution is $qi%. For example, we have a cointhat we think is fair, i.e., the probability of getting a heador a tail when the coin is tossed is equal. If we toss thiscoin n times, on average we expect heads half of thetime and tails the other half. In reality, the coin, by vir-tue of its uneven weight distribution, will not be com-pletely fair, so our expectation will turn out to be wrong.There will consequently be a discrepancy between ourexpected and real probability distribution. This discrep-ancy is very frequently the case in real life, and it is, infact, very rare that we have complete information aboutany event. Therefore we can formalize that when a par-

Page 3: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

199V. Vedral: Relative entropy in quantum information theory

ticular outcome j happens, we associate surprise 2ln qj

with it. The average surprise, or information, accordingto this erroneous belief, is

2(i

pi ln qi .

Since events happen with probabilities $pi% (in spite ofour belief), these are the correct ones to feature in theaveraging process. However, the real amount of infor-mation we are obtaining is, as defined earlier, given bythe Shannon entropy S(p)52( ipi ln pi . It is not so dif-ficult to show that S(p)<2( ipi ln qi (equality holds ifand only if pi5qi for all i) so that there is an ‘‘uncer-tainty deficit,’’ as it were, stemming from our wrong as-sumption and equal to the difference between the twoaverages. This deficit quantity is called the relative en-tropy.

Definition. Suppose that we have two sets of discreteevents ai and bj with the corresponding probability dis-tributions, p(ai) and p(bj). The relative entropy be-tween these two distributions is defined as

S@p~a !ip~b !#ª(i

p~ai!lnp~ai!

p~bi!. (2)

This function is a measure of the ‘‘distance’’ betweenp(ai) and p(bj), even though, strictly speaking, it is nota mathematical metric since it fails to be symmetric:S@p(a)uup(b)#ÞS@p(b)uup(a)# . This is interesting,since at first it looks as if there should be no differencebetween mistaking the probability distribution pi for qi ,or vice versa. Intuitively this can be explained using ourcoin example. Suppose that someone gives us a coin thatis either fair or completely unfair, e.g., it always givesheads. Now we have to toss this coin a number of timesand infer which of the two coins we have. If we toss thefair coin and obtain tails, then our inference will imme-diately be that we have the fair coin. If, however, weobtain heads, then it could be either coin. If we tossed itmore times, the fair coin would eventually give us tails.If, however, we were holding the completely unfair coinfrom the beginning, then even after 100 heads we couldnever really eliminate the possibility that it is the faircoin, since this outcome is statistically possible (althoughhighly unlikely). Therefore how certain we are aboutwhich coin we hold is clearly dependent on whichevercoin we hold and how different it is from the other one.As we shall see shortly, our uncertainty is quantified bythe relative entropy, and it is thus to be expected that itis asymmetric. I now describe this statistical approach inmore detail.

A. Statistical significance

A more operational interpretation of both the Shan-non entropy and the relative entropy comes from thestatistical point of view. The generalization of this for-malism to the quantum domain will be presented in thenext section and I shall offer an operational interpreta-tion of the measures of quantum correlations to be in-

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

troduced therein. I now follow the approaches of Coverand Thomas (1991), and Csiszar and Korner (1981); thereader interested in more detail should consult thesetwo books.

Let X1 ,X2 , . . . ,Xn be a sequence of n symbols froman alphabet A5$a1 ,a2 , . . . ,a uAu%, where uAu is the sizeof the alphabet. We denote a sequence x1 ,x2 , . . . ,xn byxn or, equivalently, by x. The type Px of a sequencex1 ,x2 , . . . ,xn will be called the relative proportion ofoccurrences of each symbol of A , i.e., Px(a)5N(aux)/n for all aPA , where N(aux) is the number oftimes the symbol a occurs in the sequence xPAn. Thus,according to this definition, the sequences 011010 and100110 are of the same type. Pn will denote the set oftypes with denominator n . If PPPn , then the set ofsequences of length n and type P is called the type classof P , denoted by T(P), i.e., mathematically

T~P !5$xPAn:Px5P%.

We now approach the first theorem about types, which isat the heart of the success of this theory and states thatthe number of types increases only polynomially with n .

Theorem 1.

uPnu<~n11 ! uAu.

Proof of this is left for the reader, but the rationale issimple. Suppose that we generate an n string of 0’s and1’s. The number of different types is then n11, i.e., poly-nomial in n : the zeroth type has only one string—allzeros; the first type has n strings—all strings containingexactly one 1; the second type has n(n21)/2 strings—allthose containing exactly two 1’s, and so on; the nth typehas only one sequence—all ones. The most importantpoint is that the number of sequences is exponential inn , so that at least one type has exponentially many se-quences in its type class, since there are only polynomi-ally many different types. A simple example is a cointossed n times. If it is a fair coin, then we expect headshalf of the time and tails the other half of the time. Thenumber of all possible sequences for this coin is 2n (i.e.,exponential in n) where each sequence is equally likely(with probability 22n). However, the size of the typeclass in which there is an equal number of heads andtails is Cn/2

n (the number of possible ways of choosing n/2element out of n elements), the log of which tends to nfor large n . Hence this type class is in some sense as-ymptotically as large as all the type classes together.

We now arrive at a very important theorem that, infact, presents the basis of the statistical interpretation ofthe Shannon entropy and relative entropy.

Theorem 2. If X1 ,X2 , . . . ,Xn are drawn according toQ(x), then the probability of x depends only on its typeand is given by

Qn~x!5e2n[S(Px)1S(PxuuQ)].

Page 4: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

200 V. Vedral: Relative entropy in quantum information theory

Proof.

Qn~x!5)i51

n

Q~xi!5 )aPA

Q~a !N(aux)

5 )aPA

Q~a !nPx(a)

5 )aPA

enPx(a)ln Q(a)

5expH n (aPA

2Px~a !lnPx~a !

Q~a !1Px~a !ln Px~a !J

5e2n[S(Px)1S(PxuuQ)]. j

Therefore the probability of a sequence becomes expo-nentially small as n increases. Indeed, our coin-tossingexample shows this: a probability for any particular se-quence (such as, for example, 0000011111) is 22n.(Note: the reason that we are using e in our theoremsinstead of 2 is that we are also using ln instead of log.)This is explicitly stated in the following corollary.

Corollary. If x is the type class of Q , then

Qn~x!5e2nS(Q).

The proof is left to the reader.As n gets large, most of the sequences become typical

and are all equally likely. Therefore the probability ofevery typical sequence times the number of typical se-quences has to be equal to unity in order to conservetotal probability (e2nS(Q)N51). From this we can seethat the number of typical sequences is N5enS(Q) (weconsider this point more formally below). Hence theabove theorem has very important implications in thetheory of statistical inference and the distinguishabilityof probability distributions. To see how this comes aboutwe state two theorems that give bounds on the size andprobability of a particular type class. The proofs followdirectly from the above two theorems and the corollary(Csiszar and Korner, 1981; Cover and Thomas, 1991).

Theorem 3. For any type PPPn ,

1

~n11 ! uAu enS(P)<uT~P !u<enS(P).

This theorem provides the exact bounds on the numberof typical sequences. Suppose that we have a probabilitydistribution p1 and p2 for heads and tails, respectively,and we toss the coin n times. The typical (most likely)sequence will be the one in which we have p1n headsand p2n tails. The number of such sequences is

Cp1nn 5

n!

~p1n !!~p2n !!;en(2p1 ln p12p2 ln p2),

i.e., an exponential in n (more tosses, more possibilities)and entropy (higher uncertainty, more possibilities). Thenext theorem offers a statistical interpretation of therelative entropy.

Theorem 4. For any type PPPn , and any distributionQ, the probability of the type class T(P) under Qn ise2nS(PuuQ) to first order in the exponent. More precisely,

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

1

~n11 ! uAu e2nS(PuuQ)<Qn@T~P !#<e2nS(PuuQ).

The meaning of this theorem is that if we draw resultsaccording to Q , the probability that it will look as if itwas drawn from P is exponentially decreasing with nand relative entropy between P and Q . The closer Q isto P , the higher the probability that their statistics willlook the same. Alternatively, the higher the number ofdraws n , the smaller the probability that we will confusethe two. (We can see an explicit example below.) Theabove two results can be succinctly written in an expo-nential fashion that will be useful to us as

uT~P !u→e2nS(P), (3)

Qn@T~P !#→e2nS(PuuQ). (4)

The first statement also leads to the idea of data com-pression, in which a string of length n generated by asource with entropy S can be encoded into a string oflength nS . The second statement says that if we are per-forming n experiments according to distribution Q , theprobability that we will get something that looks as if itwas generated by distribution P decreases exponentiallywith n , depending on the relative entropy between Pand Q . This idea immediately leads to Sanov’s theorem,whose quantum analog will provide a statistical interpre-tation of one measure of entanglement presented in Sec.IV. We now consider data compression and Sanov’stheorem.

1. Classical data compression

Suppose that we have a binary source generating 0’swith twice as great a probability as that of 1’s, so that theShannon entropy is S5ln 322/3 ln 250.64. Imagine thatwe have a string of 15 digits coming out of this source.Then, according to the above considerations [Eq. (3)],the most likely type will be the one with ten 0’s and five1’s. But the size of this class is only 0.64315'10, so wecan use only 10 digits to encode all the above sequencesof 15 numbers just by assigning the following conven-tional mapping: The first sequence of 15 numbers is tobe encoded in 0000000000, the second sequence is to beencoded in 0000000001, . . . , the e10th sequence is to beencoded in 1111111111. This encoding is for obviousreasons called data compression. This, in fact, offers astatistical reason for employing the Shannon entropy asa measure of uncertainty. This result is known as Shan-non’s lower bound (or Shannon’s First Theorem) on datacompression, i.e., a message cannot be compressed perbit to less than its Shannon entropy (Shannon andWeaver, 1949). There are a number of different methodsused for compression, each with varying degrees of suc-cess dependent on the statistical distribution of the mes-sage; see, for example, Cover and Thomas (1991).

Now we look at the distinguishability of two probabil-ity distributions. Suppose we would like to checkwhether a given coin is fair, i.e., whether it generates ahead-tail distribution of f5(1/2,1/2). When the coin isbiased it will produce some other distribution, say uf

Page 5: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

201V. Vedral: Relative entropy in quantum information theory

5(1/3,2/3). So the question of the coin’s fairness boilsdown to how well we can differentiate between twogiven probability distributions given a finite number n ofexperiments to perform on one of the two distributions.In the case of a coin we would toss it n times and recordthe number of 0’s and 1’s. From simple statistics (Coverand Thomas, 1991) we know that if the coin is fair thenthe number of 0’s, N(0), will be roughly n/22An<N(0)<n/21An , for large n , and the same for thenumber of 1’s. Therefore if our experimentally deter-mined values do not fall within the above limits the coinis not fair. We can look at this from another point ofview, which is in the spirit of the method of types;namely, what is the probability that a fair coin will bemistaken for an unfair one with the distribution of(1/3,2/3) given n trials of the fair coin? For large n theanswer is given in the previous subsection,

p~fair→unfair!5e2nS(ufuuf ),

where Scl(ufuuf )51/3 ln 1/312/3 ln 2/321/3 ln 1/222/3 ln 1/2 is the Shannon relative entropy for the twodistributions. So

p~fair→unfair!53n22 ~5/3! n,

which tends exponentially to zero with n→` . In fact wesee that after ;20 trials the probability of mistaking thetwo distributions is vanishingly small, ,10210. This leadsto the following important result (Sanov, 1957).

2. Sanov’s theorem

If we have a probability distribution Q and a set ofdistributions E,P, then

Qn~E !→e2nS(P* uuQ), (5)

where P* is the distribution in E that is closest to Qusing the Shannon relative entropy (see Fig. 1).

This can also be rephrased in the language of distin-guishability: when we are distinguishing a given distribu-tion from a set of distributions, then what matters is howwell we can distinguish that distribution from the closestone in the set (see Fig. 1). When we turn to the quantumcase later, the probability distributions will becomequantum densities representing various states of a quan-

FIG. 1. The concept of distinguishability. What do we mean bythe distance from the cyclist to the city in the figure? It isdefined as the distance from the cyclist to the closest house inthe city. Also, which distance measure is the most appropriatefor measuring this? In the text I argue that when it comes todistinguishing between two or more probability distributions,the most appropriate measure is the relative entropy.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

tum system, and the question will be how well we candistinguish between these states. Note that we could alsotalk about Q coming from a set of states, in which casewe would have S(PuuQ* ), where Q* is the state thatminimizes the relative entropy (i.e., the closest state).

B. Other information measures from relative entropy

Another important concept derived from relative en-tropy concerns the gathering of information. When onesystem learns something about another, their states be-come correlated. How correlated they are, or how muchinformation they have about each other, can be quanti-fied by the mutual information.

Definition. The Shannon mutual information betweentwo random variables A and B , having a joint probabil-ity distribution p(ai ,bj) and therefore marginal prob-ability distributions p(ai)5( jp(ai ,bj) and p(bj)5( ip(ai ,bj), is defined as

IS~A :B !ªS@p~a !#1S@p~b !#2S@p~a ,b !# . (6)

There are two very instructive ways of looking at thisquantity, which will form a basis for this review. Math-ematically, IS can be written in terms of the Shannonrelative entropy. In this sense it represents a distancebetween the distribution p(a ,b) and the product of themarginals p(a)3p(b). As such, it is intuitively clearthat this is a good measure of correlations, since it showshow far a joint distribution is from the product one inwhich all the correlations have been destroyed, or alter-natively, how distinguishable a correlated state is from acompletely uncorrelated one. So we have

IS~A :B !5S@p~a ,b !uup~a !3p~b !# .

Let us now view this from another angle. Suppose thatwe wish to know the probability of observing bj if ai hasbeen observed. This is called a conditional probabilityand is given by

pai~bj!ª

p~ai ,bj!

p~ai!.

This motivates us to introduce a conditional entropy,SA(B), as

SA~B !52(i

p~ai!(j

pai~bj!ln pai

~bj!

52(ij

p~ai ,bj!ln pai~bj!.

This quantity tells us how uncertain we are about thevalue of B once we have learned about the value of A .Now the Shannon mutual information can be rewrittenas

IS~A :B !5S~B !2SA~B !5S~A !2SB~A !. (7)

Hence, the Shannon mutual information, as its name in-dicates, measures the quantity of information conveyedabout the random variable A(B) through measurementsof the random variable B(A). This quantity, being posi-

Page 6: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

202 V. Vedral: Relative entropy in quantum information theory

tive, tells us that the initial uncertainty in B(A) can inno way be increased by making observations on A(B).Note also that, unlike the Shannon relative entropy, theShannon mutual information is symmetric (see Fig. 2).The following example demonstrates the symmetry ofthe Shannon mutual information.

Let us briefly go back to our original idea of a surpriseto interpret the Shannon mutual information as a mea-sure of correlations. Suppose that one of our friendslikes to wear socks of two colors only: red and blue. Inaddition, we know that her socks are always the samecolor and that when she gets up in the morning she ran-domly chooses the color, but we know that she prefersblue to red with the ratio 3:1. So when we meet ourfriend, before we have looked at the color of her socks,we know that she wears blue socks with the probabilityp(b)50.75 and red socks with the probability p(r)50.25. However, when we look at one sock and observe,say, the color blue, we immediately know that the othersock must be blue, too. This means that the colors of hertwo socks are correlated. Before we look at one of thesocks, we are uncertain about the color of the other sockby an amount of 20.75 ln 0.7520.25 ln 0.25. But whenwe look at one of them the uncertainty immediately dis-appears. We therefore expect that the information wegain about one sock by looking at the color of the otheris given by 20.75 ln 0.7520.25 ln 0.25. The Shannon mu-tual information predicts exactly the same thing. We seethat the largest correlations would be if p(r)5p(b)50.5 and this would be ln 2. This, of course, agrees withour intuitive notion of surprise, since before looking atour friend’s one sock, we would be completely uncertainabout the color of the other sock. By observing its colorwe obtain the largest possible amount of information(i.e., remove the largest possible uncertainty in thiscase).

Although it will be seen that the Shannon mutual in-formation is a good measure of correlation between tworandom variables, its natural generalization to three ormore random variables fails. It is easy to see that fromthree random variables the Shannon mutual informationshould be of the following form:

FIG. 2. Venn diagram representation of the joint Shannon en-tropy of two random variables as well as the marginal Shannonentropies. It is clear that geometrically the Shannon mutualinformation is obtained by summing the marginal entropiesand subtracting the total entropy. It is interesting to note thatits generalization fails for three or more random variables.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

IS~A :B :C !5S~A ,B ,C !2S~A ,B !2S~A ,C !

2S~B ,C !1S~A !1S~B !1S~C !. (8)

However, there exist A ,B ,C such that IS(A :B :C),0 (Ileave this as an exercise for the reader), and since weregard the amount of correlation as being strictly posi-tive, this is automatically ruled out as a good measure ofcorrelation. A way to sidestep this difficulty is to definemutual information via the relative entropy asS@p(A ,B ,C)uup(A)p(B)p(C)# . This is a positive quan-tity representing the distance of the joint probability dis-tribution of three random variables A ,B ,C from theproduct of the corresponding marginal probability distri-butions. This, of course, immediately generalizes to anynumber of random variables. Next I show why the rela-tive entropy and mutual information are also very usefulfrom the dynamical perspective.

C. Classical evolution and relative entropy

The above application of relative entropy to physicsvia the concept of distinguishability might seem con-trived. This is, however, not at all the case, and this sec-tion shows the great importance of relative entropy forthe dynamics of classical systems. A state of a physicalsystem in classical mechanics can be represented as avector whose entries are various probabilities for thesystem to occupy its different possible states. The evolu-tion of this system is seen as the change of these prob-abilities with time. Hence evolution is a linear transfor-mation of one state into another state, i.e., of a vectorinto another vector,

qj5(k

P~ juk !pk ,

where P(juk) is the conditional probability for the sys-tem to change from the state k to the state j . Becausethe probability has to be conserved (( jqj51), we have(kP(juk)51. Matrices with this simple property,namely, that their entries are positive and columns sumup to 1, are called stochastic. The above can be general-ized to continuous systems and continuous time evolu-tion, but this will not be relevant for the rest of thisreview.

A very important property of any measure that aimsto quantify the amount of correlation between two ran-dom variables (i.e., two states of the same system or twodifferent systems in classical mechanics) is the following:if either or both of the variables undergo a local stochas-tic evolution, then the amount of correlation cannot in-crease (in fact, it usually decreases). I shall now provethis in the case of the Shannon mutual information, fol-lowing an approach similar to that given by Everett(1973); see also Penrose’s excellent book on statisticalmechanics (Penrose, 1973).

First, let us establish without proof two inequalitiesfollowing from the convex properties of the logarithmicfunctions (Everett, 1973). Lemma 1 states that entropy

Page 7: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

203V. Vedral: Relative entropy in quantum information theory

is a concave function, whereas lemma 2 states that rela-tive entropy is a convex function.

Lemma 1. ( iPixi ln (iPixi<(iPixi ln xi , where xi>0,Pi>0 and ( iPi51.

Physically, this inequality means that the average un-certainty (negative of the right-hand side) is less than orequal to the uncertainty of the average (negative of theleft-hand side); in other words, mixing probability distri-butions increases entropy. This is a very important prop-erty of entropy as a measure of uncertainty, since whenwe mix probability distributions we expect to increaseour uncertainty.

Lemma 2. ( ixi ln ((ixi /(iai) <(ixi ln (xi /ai), where xi>0 and ai>0 for all i .

This is just a statement of the fact that mixing de-creases distinguishability. Note that this is in accord withlemma 1, since the more mixed the probability distribu-tions, the less distinguishable they are.

These two simple and self-evident statements lead toa very important result, that the Shannon relative en-tropy between two probability distributions decreaseswhen the same two undergo a stochastic process. This isa very satisfying property from the physical point ofview, where two probability distributions undergoingstochastic changes in fact represent two evolving physi-cal systems. It says that two probability distributions arein some sense closer to each other (i.e., harder to distin-guish) after a stochastic process, or analogously, that twophysical systems become more alike.

Let us consider a sequence of transition-probabilitymatrices Tij

nªPn(iuj), where ( jTij

n 51 for all n , i , and0<Tij

n<1. We also introduce a sequence of positivemeasures (i.e., probability distributions) ai

n having theproperty that

ajn115(

iai

nTijn .

Transition probabilities T tell us the probability that atthe nth step of evolution the system will ‘‘jump’’ fromthe jth to the ith state. Thus constructed transition ma-trices are stochastic for all n . Let us further suppose thatwe have a sequence of probability distributions pi

n gen-erated by the action of the above stochastic process,such that

pjn115(

ipi

nTijn .

This is the law describing the system’s evolution in time,and the state of the system at time n is given by theprobabilities pi

n . For each of these probability distribu-tions the relative entropy Sn is defined as

Sn~puua !ªS~pnuuan!5(i

pin ln

pin

ain .

Let us now prove the following theorem:Distinguishability never increases,

Sn11~puua !<Sn~puua !. (9)

Proof. Expanding Sn11(puua) we obtain

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

Sn11~puua !5(j

pjn11 ln

pjn11

ajn11

5(j

H(i

pinTij

n J ln(

ipi

nTijn

(i

ainTij

n.

However, using lemma 2 we have the following inequal-ity:

(i

pinTij

n ln(

ipi

nTijn

(i

ainTij

n<(

ipi

nTijn ln

pinTij

n

ainTij

n .

From the above two it follows that

Sn11~puua !<(j

H(i

pinTij

n lnPi

n

ain J

5(i

pinTij

n lnpi

n

ain 5(

ipi

n lnpi

n

ain 5Sn~puua !

and the proof is completed. j

This property means that a distance between twostates cannot increase with time if the states evolve un-der any stochastic map. The proof can be immediatelyspecialized to the cases in which T is stationary, i.e., T isindependent of n , and when T is doubly stochastic, i.e.,( iTij51 for all j . A corollary to this important lemma isthe following.

Corollary. If we take p5p(a ,b) and a5p(a)p(b),and suppose that the stochastic processes acting sepa-rately on A and B are uncorrelated, we see that theShannon mutual information does not increase underthese local stochastic processes (by local we mean thatthey act separately on A and B).

This is a very important, and physically intuitive,property of any measure of correlation; its quantum ana-log will be of central importance for quantifying quan-tum correlation between entangled subsystems. This cor-ollary, in fact, can be taken as a guide for a ‘‘good’’measure of correlation. We can state that any measureof correlation has to be nonincreasing under local sto-chastic processes. In other words, this means that theonly way that systems can become more correlated, i.e.,that they gain more information about each other, is ifthey interact. Without mutual interaction the correla-tions can only decrease or at best stay the same. Thenature of quantum local stochastic processes will formthe physical basis for our argument in the next section.A condition similar to property above, but employingquantum stochastic processes, will be a key element inour search for measures of entanglement. When we goto quantum mechanics, the notion of a probability dis-tribution will be replaced by that of a quantum state(i.e., a density matrix), and a stochastic process will be-come a measurement process in quantum theory. Theformulation of probability theory that is most naturally

Page 8: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

204 V. Vedral: Relative entropy in quantum information theory

generalized to quantum states is provided by Kolmog-orov (1950), and the quantum generalization expressingsimilarities with von Neumann’s Hilbert-space formula-tion (von Neumann, 1932) can be found in Mackey(1963; see also Holevo, 1982). However, knowledge ofthis approach will not be necessary for the rest of thereview. Finally it is important to stress that if the localstochastic processes are correlated they virtually becomeglobal, and therefore the correlations between the sys-tems can increase as well as decrease.

D. Schmidt decomposition and quantum dynamics

The difference between classical and quantum physicscan be seen in the fact that quantum states are describedby a density matrix r (and not just vectors). The densitymatrix is a positive semidefinite Hermitian matrix,whose trace is unity (representing the fact that all theprobabilities add up to 1). An important class of densitymatrices is the idempotent one, i.e., r25r . The statesthese matrices represent are called pure states. Whenthere is no uncertainty in the knowledge of the system’sstate its state is then pure. Another important notion isthat of a composite system. A composite quantum sys-tem is one that consists of a number of quantum sub-systems. When those subsystems are entangled it is im-possible to ascribe a definite state vector to any one ofthem. The most often cited entangled system is a pair oftwo photons, being in the Einstein-Podolsky-Rosen state(Einstein et al., 1935; Bell, 1987). The composite systemis then mathematically described by

uC&51

&~ u↑&u↓&1u↓&u↑&), (10)

where the first ket in either product belongs to one pho-ton and the second to the other. The property that isdescribed is the direction of spin or polarization alongthe z axis, which can either be up (u↑&) or down (u↓&). Atwo-level system of this type is the quantum analog of abit, which we shall henceforth call a qubit. We can im-mediately see that neither of the photons possesses adefinite state vector. The best that one can say is that ifa measurement is made on one photon and it is found tobe in the up state, for example, then the other photon iscertain to be in the down state. This idea cannot be ap-plied to a general composite system unless the former iswritten in a special form. This motivates us to introducethe so-called Schmidt decomposition (Schmidt, 1907),which not only is mathematically convenient, but alsogives a deeper insight into correlations between the twosubsystems.1

According to the rules of quantum mechanics thestate vector of a composite system, consisting of sub-systems U and V , is represented by a vector belonging

1In the context of quantum theory see Everett (1957; 1973, p.3). A graduate-level textbook by Peres (1993, Chap. 5) in-cludes a brief description of the Schmidt decomposition.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

to the tensor product of the two Hilbert spaces HU^ HV . The general state of this system can be written asa linear superposition of products of individual states:

uC&5(n

(m

cnmuun&uvm&, (11)

where $uun&%n51N and $uvm&%m51

N are the orthonormal ba-sis of the subsystems U and V , respectively, whose di-mensions are dimU5N and dimV5M . This state canalways be written in the so-called Schmidt form:

uC&5(n

gnuun8&uvn8&, (12)

where uun8& and uvn8& are orthonormal bases for U and V ,respectively. Note that in this form the correlations be-tween the two subsystems are fully displayed. If U isfound in the state uu28& , for example, then the state of Vis uv28&. This is clearly a multistate generalization of theEinstein-Podolsky-Rosen state mentioned earlier.

I shall now prove this assertion by showing how toderive Eq. (12) from Eq. (11). To that end, let us assumethat M.N , which in no way affects our line of argumentsince the procedure is symmetric with respect to the sub-systems. Then we have the following five steps.

(1) First we construct a density matrix describing uC&5(n(mcnmuun&uvm& . Once the density matrix isknown, all the properties of the system can be de-duced from it. Moreover, ensembles that are pre-pared differently but have the same density matrixare statistically indistinguishable and thereforeequivalent. Generally, if we have a mixed state in-volving vectors uC1&,uC2&, . . . ,uCD& with correspond-ing classical probabilities w1 ,w2 ,. . . ,w3 , then thedensity matrix is defined to be

r5(d51

D

wduCd&^Cdu.

Since in our case uC& is a pure state, the densitymatrix is a projection operator onto uC&, i.e.,

r5uC&^Cu5(nm

(pq

rnmpquun&^upu^uvm&^vqu ,

where rnmpq5cnmcpq* . If, however, we wish to dealwith only one of the subsystems, then we employ theconcept of the reduced density matrix.

(2) We find the reduced density matrix of the subsystemU , obtained by tracing r over all states of the sub-system V , so that

rU5(q

^vquruvq&5(nm

(p

rnmpmuun&^upu.

Note that the partial trace (or the trace itself) doesnot depend on the choice of basis. Partial tracing isanalogous to finding marginal probability distribu-tions from a joint probability distribution in classicalprobability theory. The crucial step in the Schmidtdecomposition is diagonalizing the above. I shall callthe eigenvalues of rU ug1u2,ug2u2,. . . ,ugNu2, and thecorresponding eigenvectors uu18&,uu28& , . . . ,uuN8 &.

Page 9: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

205V. Vedral: Relative entropy in quantum information theory

(3) We then reexpress the above in terms of uu8&’s, i.e.,

uC&5(n

(m

cnm8 uun8&uvm&.

(4) Now we construct a new orthonormal basis of thesubsystem V such that each new vector is a ‘‘clever’’linear superposition of the old ones, so that

uvi8&5(m

cim8

giuvm&.

The matrix given by the coefficients cim8 /gi is unitary,which is why the new basis is orthonormal.

(5) The Schmidt decomposition of uC& is now given by

uC&5(n

gnuun8&uvn8&.

There are two important observations to be made,which are fundamental to understanding the correlationbetween two subsystems in a joint pure state.

• The reduced density matrices of both subsystems,written in the Schmidt basis, are diagonal and havethe same positive spectrum. In particular, the overalldensity matrix is given by

r5(nm

gngm* uun8&^um8 u^uvn8&^vm8 u

whereas the reduced ones are

rU5(m

^vm8 uruvm8 &5(n

ugnu2uun8&^un8u,

rV5(n

^un8uruun8&5(m

ugmu2uvm8 &^vm8 u.

• If a subsystem is N dimensional it can then be en-tangled with no more than N orthogonal states of an-other one.

I should like to point out that the Schmidt decompo-sition is, in general, impossible for more than two en-tangled subsystems. To clarify this, consider three en-tangled subsystems as an example. Here our intentionwould be to write a general state such that by observingthe state of one of the subsystems we could instanta-neously and with certainty know the state of the othertwo. But this is impossible in general, for the presence ofthe third system makes the prediction uncertain.Loosely speaking, while we know the state of one of thesubsystems, the other two might still be entangled andcannot have definite vectors associated with them [anexception to this general rule is, for example, a state ofthe Greenberger-Horne-Zeilinger type (1/&)(u↑&u↑&u↑&1u↓&u↓&u↓&)]. Clearly, involvement of even more sub-systems complicates this analysis even further and pro-duces, so to speak, an even greater mixture and uncer-tainty. The same reasoning applies to mixed states oftwo or more subsystems (i.e., states whose density op-erator is not idempotent r2Þr), for which we cannothave the Schmidt decomposition in general. This reasonalone is responsible for the fact that the entanglement oftwo subsystems in a pure state is simple to understand

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

and quantify, while for mixed states, or states consistingof more than two subsystems, the question is much moreinvolved.

Let us now consider the way in which quantum sys-tems evolve. An isolated system, of course, follows aunitary dynamics generated by Schrodinger’s equation(nonrelativistic). This evolution is fully reversible (mani-festing itself in the fact that the quantum entropy doesnot increase during this process, as we shall see below).However, we know that most of the processes in Natureare irreversible (think of spontaneous emission and thenonexistence of its reverse, ‘‘spontaneous absorption’’).These processes are nonunitary and arise from the inter-action of the system with the environment; thus the sys-tem is no longer closed. Mathematically, the evolution ofa quantum state is then most generally of the form(Davies, 1976)

r85(a

AarAa† , (13)

where, because of the conservation of probability, or,more precisely, trace preservation, (aAa

† Aa51. Theabove map is the most general completely positive(trace-preserving) linear map (CP map; Choi, 1975).Positivity means that density matrices are mapped intodensity matrices (strictly speaking, positive operatorsare mapped onto positive operators). To define ‘‘com-plete,’’ we first need to introduce the idea of an ex-tended state. By extension of a state I mean any state ona larger Hilbert space that reduces itself to the originalstate when the extended part is traced out. In turn, com-pleteness means that any extension of the density matrixis also mapped into a density matrix. To clarify this Ishall present a few examples of CP maps.

• Projectors are Hermitian idempotent operators (P†

5P and P25P) and the evolution of the form r→( iPirPi is a CP map;

• Addition of another system to r is also a CP map, r→r ^ r1 ;

• Let Ei>0 and ( iEi5I . Then r→pkªTr(rEk) is aCP map that generates a probability distribution froma density matrix.

• Unitary evolution is a special case of a CP map, whereonly one operator is present in the sum, i.e., UrU†.

I leave it for the reader to show that the above CP mapscan indeed be written in the form of Eq. (13). We shallsee other examples in the next subsection.

Remarkably not all positive maps are completely posi-tive, transposition being a well-known example. Positiv-ity of transposition follows from the fact that for anystate r, its transposition rT>0. However, a counterex-ample to completeness comes from, for example, a sin-glet state of two subsystems A and B . Namely, if wetranspose only A (or B), then the resulting operator isnot positive (so that it is not a physical state), i.e., rAB

TA

,0. Confirmation of this is left as an exercise.The reader might wonder what the physical imple-

mentation of the canonical form (aAarAa† is. I shall

Page 10: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

206 V. Vedral: Relative entropy in quantum information theory

now introduce another kind of CP map that will explainits physical importance and will be crucial for the rest ofthe review. Loosely stated, any CP map can be repre-sented as a unitary transformation on a higher Hilbertspace (see Fig. 3). That is, from Schmidt decompositionwe know that a density matrix can be represented as a‘‘reduction’’ of a state in an enlarged Hilbert space. Sup-pose that rPH and that rEPH^ Ha is an ‘‘extension’’ ofthe state r such that TrarE5r . Then a CP map s5F(r) can be represented as

r→rE→UrEU†→Tra~UrEU†!5s . (14)

Here we have first ‘‘lifted’’ r to rE , then evolved rEunitarily into UrEU†, which, after tracing over theHilbert-space extension (i.e., lowering), yields the finalstate s as in Fig. 3. The fact that for any CP map thereexists a unitary operator U that will execute this map onsome higher Hilbert space is guaranteed by a theoremproved independently by Kraus (1983) and Ozawa(1984; see Schumacher, 1996 for a modern presentation).I shall now present only a plausibility argument for thiscorrespondence. Let rE5r ^ u0&^0ua where u0&^0uaPHa . Then

s5Tra~Ur ^ u0&^0uaU†!5(i

^iuaUr ^ u0&^0uaU†ui&a

5(i

^iuUu0&r^0uU†ui&,

which has the same form as Eq. (13) providing that wedefine Aiª^iuUu0&. Thus, given a unitary evolution onthe extended Hilbert space, we can always find corre-sponding positive operators that describe the evolutionof the original system. Note that the choice of the op-erators is not unique.

Finally, I should like to discuss another frequentlyused concept that is in some sense derived from the no-tion of the CP map. It can be loosely stated that the CPmap represents the evolution of a quantum system whenwe do not update the knowledge of its state based on the

FIG. 3. Completely positive trace-preserving map. The mostgeneral evolution in quantum mechanics is represented by acompletely positive trace-preserving map (CP map). This fig-ure shows two equivalent forms for such a map: (a) the canoni-cal form A(•)A†; (b) the extension to a larger Hilbert spaceHE and an appropriate unitary transformation therein. Theconnection is explained in the text.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

particular measurement outcome. This is why we have asummation over all measurements in Eq. (13). If, on theother hand, we know that the outcome corresponding tothe operator Aj

†Aj occurs, then the state of the systemimmediately afterwards is given by AjrAj

†/tr(Aj†Ajr).

This type of measurement is the most general one and iscommonly referred to as the positive operator valuedmeasure (POVM). It is positive because operators of theform A†A are always positive for any operator A andtaking the trace of it together with any density matrixgenerates a positive number (i.e., a probability for thatparticular measurement outcome). For a more detailedoverview of POVM’s see Peres (1993). The concept ofthe POVM will play a significant role when defining thequantum relative entropy in the next subsection.

E. Quantum relative entropy

When two subsystems become entangled, the compos-ite state can be expressed as a superposition of the prod-ucts of the corresponding Schmidt basis vectors. FromEq. (12) it follows that the ith vector of either subsystemhas a probability of ugiu2 associated with it. We are,therefore, uncertain about the state of each subsystem,the uncertainty being larger if the probabilities areevenly distributed. Since the uncertainty in the probabil-ity distribution is naturally described by the Shannonentropy, this classical measure can also be applied inquantum theory. In an entangled system this entropy isrelated to a single observable. The general state of aquantum system, as I have already remarked, is de-scribed by its density matrix r. If A is an observablepertaining to the system described by r, then by thespectral decomposition theorem A5( iaiPi , where Pi isthe projection onto the state with the eigenvalue ai . Theprobability of obtaining the eigenvalue aj is given bypj5Tr(rPj)5Tr(Pjr). The uncertainty in a given ob-servable can now be expressed through the Shannon en-tropy. Let the observables A and B , pertaining to thesubsystems U and V , respectively, have a discrete, non-degenerate spectrum, with corresponding probabilitiesp(ai) and p(bj) of observables A being ai and B beingbj . In addition, let the joint probability be p(ai ,bj).Then

S~A !52(i

p~ai!ln p~ai! (15)

52(ij

p~ai ,bj!ln (j

p~ai ,bj!, (16)

S~B !52(j

p~bj!ln p~bj! (17)

52(ij

p~ai ,bj!ln (i

p~ai ,bj!, (18)

S~A ,B !52(ij

p~ai ,bj!ln p~ai ,bj!, (19)

Page 11: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

207V. Vedral: Relative entropy in quantum information theory

where I have used the fact that ( jp(ai ,bj)5p(ai) and( ip(ai ,bj)5p(bj). We have seen that a signature ofcorrelation is that the sum of the uncertainties in theindividual subsystems is greater than the uncertainty inthe total state. Hence, the Shannon mutual informationis a good indicator of how much the two given observ-ables are correlated. However, this quantity, as it is in-herently classical, describes the correlations betweensingle observables only. The quantity that is related tothe correlations in the overall state as a whole is the vonNeumann mutual information. Since it is assigned to thestate as a whole, it is of little surprise that it depends onthe density matrix. First, however, I define the von Neu-mann entropy (von Neumann, 1932), which can be con-sidered as the proper quantum analog of the Shannonentropy (Wehrl, 1978; Ohya and Petz, 1993; Ingardenet al., 1997).

Definition. The von Neumann entropy of a quantumsystem described by a density matrix r is defined as

SN~r!ª2Tr~r ln r! (20)

(I shall drop the subscript N whenever there is no pos-sibility of confusion). The Shannon entropy is equal tothe von Neumann entropy only when it describes theuncertainties in the values of the observables that com-mute with the density matrix, i.e., the Schmidt observ-ables. Otherwise

S~A !>SN~r!,

where A is any observable of a system described by r.This means that there is more uncertainty in a singleobservable than in the whole of the state, a fact thatentirely contradicts our expectations.

I now discuss a relation concerning the entropies oftwo subsystems. One part of it is somewhat analogous toits classical counterpart, but instead of referring to ob-servables it is related to the two states. This inequality iscalled the Araki-Lieb inequality (Araki and Lieb, 1970)and is one of the most important results in the quantumtheory of correlations. Let rA and rB be the reduceddensity matrices of subsystems A and B , respectively,and let r be the matrix of a composite system; then

SN~rA!1SN~rB!>SN~r!>uSN~rA!2SN~rB!u.

Physically, the left-hand side implies that we have moreinformation (less uncertainty) in an entangled state thanif the two states are treated separately. This arises natu-rally, since by treating the subsystems separately wehave neglected the correlations (entanglement). Wenote that if the composite system is in a pure state, thenS(r)50, and from the right-hand side it follows thatS(rA)5S(rB) [cf. Schmidt decomposition Eq. (12)]. Toappreciate the extent to which this is a counterintuitiveresult, consider the following example. Suppose a two-level atom is interacting with a single mode of an elec-tromagnetic field as in the Jaynes-Cummings model(Jaynes and Cummings, 1963). If the overall state is ini-tially pure, and the whole system is isolated, then theentropies of the atom and the field are equally uncertainat all times. This is not expected, since the atom has only

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

two degrees of freedom and the field infinitely many.However, it is so, as, by the second observation, theatom, as a two-dimensional subsystem, is entangled withonly two dimensions of the field.

I present without proofs two important properties ofentropy that will be used in later sections (Wehrl, 1978).These are

~1! additivity: SN~rA ^ rB!5SN~rA!1SN~rB!;(21)

~2! concavity: SNS (i

l ir iD>(i

l iSN~r i!. (22)

The first property is the same as in classical informationtheory, namely, the entropies of independent systemsadd up. The concavity simply reflects the fact that mix-ing increases uncertainty.

Following the definition of the Shannon mutual infor-mation, I introduce the von Neumann mutual informa-tion, which refers to the correlation between whole sub-systems rather than that relating only two observables.

Definition. The von Neumann mutual information be-tween two subsystems rU and rV of a joint state rUV isdefined as

IN~rU :rV ;rUV!5SN~rU!1SN~rV!2SN~rUV!. (23)

As in the case of the Shannon mutual information thisquantity can be interpreted as a distance between twoquantum states. For this we first need to define the vonNeumann relative entropy, in a direct analogy with theShannon relative entropy [in fact, this quantity was firstconsidered by Umegaki (1962), but for consistency rea-sons I name it after von Neumann; I shall also refer to itas the quantum relative entropy].

Definition. The von Neumann relative entropy be-tween the two states s and r is defined as

SN~sir!5Trs~ ln s2ln r!. (24)

This measure also has the same statistical interpretationas its classical analog: it tells us how difficult it is todistinguish the state s from the state r (Hiai and Petz,1991). To that end, suppose we have two states s and r.How can we distinguish them? We can choose a POVM( i51

M Ai51 that generates two distributions via

pi5trAis , (25)

qi5trAir , (26)

and use classical reasoning to distinguish these two dis-tributions. However, the choice of POVM is not unique.It is therefore best to choose that POVM which distin-guishes the distributions most, i.e., for which the classi-cal relative entropy is largest. Thus we arrive at the fol-lowing quantity:

S1~sir!ªsupA’sH(i

trAis ln trAis2trAis ln trAirJ ,

where the supremum is taken over all POVM’s. Theabove is not the most general measurement that we canmake, however. In general we have N copies of s and rin the state

Page 12: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

208 V. Vedral: Relative entropy in quantum information theory

(27)

(28)

We may now apply a POVM ( iAi51 acting on sN andrN. Consequently we define a new type of relative en-tropy,

SN~sir!ªsupA’sH 1N (

itrAis

N ln trAisN

2trAisN ln trAir

NJ . (29)

Now it can be shown that (Donald, 1986, 1987)

S~sir!>SN , (30)

where S(sir) is the quantum relative entropy. (This re-ally is a consequence of the fact that the relative entropydoes not increase under general CP maps, a fact that willbe proven later in this subsection.) Equality is achievedin Eq. (30) if and only if s and r commute (Fuchs, 1996).However, for any s and r it is true that (Hiai and Petz,1991)

S~sir!5 limN→`

SN .

In fact, this limit can be achieved by projective measure-ments that are independent of s (Hayashi, 2001). Fromthese considerations it would naturally follow that theprobability of confusing two quantum states s and r (af-ter performing N measurements on r) is (for large N)

PN~r→s!5e2NS(sir). (31)

We should like to stress here that classical statistical rea-soning applied to distinguishing quantum states leads tothe above formula. There are, however, other ap-proaches. Some take Eq. (31) for their starting point andthen derive the rest of the formalism thenceforth (Hiaiand Petz, 1991). Others assume a set of axioms that mustbe satisfied by the quantum analog of the relative en-tropy (for example, it should reduce to the classical rela-tive entropy if the density operators commute, i.e., ifthey are classical) and then derive Eq. (31) as a conse-quence (Donald, 1986, 1987). In any case, as I have ar-gued here, there is strong reason to believe that thequantum relative entropy S(sir) plays the same role inquantum statistics as the classical relative entropy playsin classical statistics (see also the review by Schumacherand Westmoreland, 2000).

The von Neumann mutual information can now beunderstood as the distance of the state rUV from theuncorrelated state rU ^ rV ,

IN~rU :rV ;rUV!5SN~rUVirU ^ rV!.

The quantum relative entropy will be the most impor-tant quantity in classifying and quantifying quantum cor-relations. It will be seen that this quantity does not in-crease under CP maps, which are quantum analogs of

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

the stochastic processes. I list three properties of therelative entropy without proof.

(F1) Unitary operations leave S(sir) invariant, i.e.,S(sir)5S(UsU†iUrU†). Unitary transformationsrepresent a change of basis (i.e., a change in our ‘‘per-spective’’) and the distance between two states shouldnot (and does not in this case) change under this.

(F2) S(TrpsiTrpr)<S(sir), where Trp is a partialtrace. Tracing over a part of the system leads to a loss ofinformation. The less information we have about twostates, the harder they are to distinguish, which is whatthis inequality says. (This property is closely related tothe strong subadditivity of relative entropy as shown inLieb and Ruskai, 1973; see also a more recent proof byLesniewski and Ruskai, 1999.)

(F3) The relative entropy is additive S(s1 ^ s2ir1^ r2)5S(s1ir1)1S(s2ir2). This inequality is a conse-quence of the additivity of entropy itself.

These properties of relative entropy have profound im-plications for the evolution of quantum systems, as Inow show.

Quantum distinguishability never increases. For anycompletely positive, trace-preserving map F, given byFs5(VisVi

† and (Vi†Vi51, we have S(FsiFr)

<S(sir).I shall first present a physical argument as to why we

should expect this theorem to hold. As I have discussed,a CP map can be represented as a unitary transforma-tion on an extended Hilbert space. According to (F1),unitary transformations do not change the relative en-tropy between two states. However, after this, we haveto perform a partial tracing to go back to the originalHilbert space, which, according to (F2), decreases therelative entropy as some information is invariably lostduring this operation. Hence the relative entropy de-creases under any CP map. I now formalize this proof.

Proof. I have discussed the fact that a CP map canalways be represented as a unitary operation1partialtracing on an extended Hilbert space H^ Hn , wheredim Hn5n (Lindblad, 1974, 1975). Let $ui&% be an ortho-normal basis in Hn and ua& be a unit vector. I define

W5(i

Vi ^ ui&^au. (32)

Then W†W51 ^ Pa , where Pa5ua&^au, and there is aunitary operator U in H^ Hn such that W5U(1 ^ Pa)(Reed and Simon, 1980). Consequently

U~A ^ Pa!U†5(ij

ViAVj†

^ ui&^ju, (33)

so that

Tr2$U~A ^ Pa!U†%5(i

ViAVi† .

This shows that the unitary and ( iVirVi† representa-

tions are equivalent. Now using properties (F2), then(F1), and finally (F3) we find

Page 13: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

209V. Vedral: Relative entropy in quantum information theory

S@Tr2$U~s ^ Pa!U†%iTr2$U~r ^ Pa!U†%#

<S@U~s ^ Pa!U†iU~r ^ Pa!U†#

5S~s ^ Pair ^ Pa!5S~sir!. (34)

This proves the result. jCorollary. Since for a complete set of orthonormal

projectors P , ( iPisPi is a CP map, then

(i

S~PisPiiPirPi!<S~sir!. (35)

[The sum can be taken outside as it can be easily shown

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

that S(( iPisPii( iPirPi)5( iS(PisPiiPirPi).] Nowfrom (F1), (F2), (F3), and Eq. (35) we have the follow-ing theorem.

Theorem 5. If s i5VisVi† then (S(s iir i)<S(sir),

where r i5VirVi†/tr(VirVi

†).Proof. Equations (32) and (33) are introduced as in

the previous proof. From Eq. (33) we have

Tr2$1 ^ PiU~A ^ Pa!U†1 ^ Pi%5ViAVi† ,

where Pi5ui&^iu. Now, from (F2), the Corollary, and(F3) it follows that

(i

S@Tr2$1 ^ PiU~s ^ Pa!U†1 ^ Pi%iTr2$1 ^ PiU~r ^ Pa!U†1 ^ Pi%#

<(i

S@1 ^ PiU~s ^ Pa!U†1 ^ Pii1 ^ PiU~r ^ Pa!U†1 ^ Pi#

<S@U~s ^ Pa!U†iU~r ^ Pa!U†#5S~s ^ Pair ^ Pa!5S~sir!. (36)

This proves theorem 5. j

This theorem will be important in the next section. Asimple consequence of the fact that the quantum relativeentropy itself does not increase under CP maps is thatcorrelations (as measured by the quantum mutual infor-mation) also cannot increase, but this is now true underlocal CP maps.

Correlations cannot increase without interaction. Cor-relations, as measured by the von Neumann mutual in-formation, do not increase during local complete mea-surements carried out on two entangled quantumsystems.

The Shannon mutual information, although havingthis desired property, does not distinguish betweenquantum and classical correlations (rather, it measurestotal correlations). In order to distinguish betweenquantum and classical, we shall have to introduce thepossibility of classical communication between A and B .This will allow classical correlations to increase whileleaving quantum correlations intact, as will be seen inthe following section. Now let us put the theory devel-oped so far to practical use in communication.

Digression on the second law of thermodynamics. Thesecond law of thermodynamics states that the entropy ofan isolated system never decreases. This does not followdirectly from the inability of the quantum relative en-tropy to increase under CP maps. Strictly speaking, anisolated system in quantum mechanics evolves unitarilyand therefore its entropy never changes. Under CPmaps, however, the entropy can both increase and de-crease. If, however, the state r is maximally mixed—I/nfor example—then the quantum relative entropy is givenby

S~sir!5lnn2S~s!. (37)

If in addition the evolution is such that I/n is the equi-librium state, then the monotonic decrease in the quan-tum relative entropy implies a monotonic increase inS(s), just as in the second law of thermodynamics. Oth-erwise the entropy itself could both increase and de-crease. A detailed discussion of the statistical founda-tions of the second law can be found in Tollman’s classicwork, The Principles of Statistical Mechanics (Tolman,1938).

III. QUANTUM COMMUNICATION: CLASSICAL USE

The central objective of communication theory is toallow a person, often referred to as Alice, to communi-cate accurately with another person, called Bob, even inthe presence of noise. Alice encodes her message into anumber of different (distinguishable) states, with eachstate representing a different symbol in the message. Forexample, Alice encodes the bit value 1 into the excitedstate of a two-level atom and sends this atom to Bob. Onits way to Bob the atom may transform into its groundstate due to either stimulated or spontaneous emission,thereby giving Bob the impression that Alice transmit-ted 0. This unwanted state transition is a form of channelnoise.

The key question is: what is the largest amount ofinformation (per symbol) that Alice can send to Bob,i.e., what is the capacity of the communication channeltaking into account any possible noise? In classical infor-mation theory the capacity for communication is givenby the mutual information between Alice’s sent messageand Bob’s received message (Shannon and Weaver,1949). This is intuitively clear, since mutual informationquantifies correlations between sent and received mes-sages and it thus tells us how faithful the transmission is.

Page 14: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

210 V. Vedral: Relative entropy in quantum information theory

If we use quantum states to encode symbols, then thecapacity is not given by the quantum mutual informationintroduced earlier. We derive a new quantity for thispurpose called the Holevo bound (Holevo, 1973; for thecontinuous case see Yuen and Ozawa, 1993). The benefitof performing the full quantum derivation is that this is amore fundamental approach to information processing.We can then deduce the classical capacity as a specialcase.

A. The Holevo bound

A quantum communication channel consists of a num-ber N of quantum systems prepared in statesr1 ,r2 , . . . ,rN and whatever physical medium is used tosend the states from Alice to Bob. These states encodeN different symbols with certain a priori probabilities,p1 ,p2 , . . . ,pN . Bob then performs a set of measure-ments to determine the correct sequence of states com-prising Alice’s symbols, which he can then use to recon-struct the entire message (Ingarden, 1976). If the statessuffer no error on the way to Bob, then the channel iscalled noiseless; otherwise it is called noisy. I consideronly the capacity of a noiseless quantum communicationchannel, since the generalization to a noisy channel isstraightforward.

Let S(r)52Trr ln r be the standard von Neumannentropy of a density matrix r. The capacity of a quantumcommunication channel is then defined as

Cªmax$p%

C~$p%,r!,

where

C~$p%,r!5SS (i

pir iD 2(i

piS~r i! (38)

is the Holevo bound. Note that the above can be ex-pressed succinctly as

C~$p%,r!5(i

piS~r iir!, (39)

where S(i) is the von Neumann relative entropy and r5( ipir i . When there is no possibility of confusion Iwrite C($p%,r)[C($p%). The reader may ask why weneed to maximize symbol probabilities in order to com-pute the capacity. This is because the channel can beused with different input probabilities and the capacityrepresents the maximum that can be communicated us-ing this channel.

To see the physical motivation behind this quantityconsider N states r1 , . . . ,rN sent by Alice to Bob ac-cording to probabilities p1 , . . . ,pN , respectively. Bobnow performs a set of complete measurements POVM( iEi5I , where Ei>0, in order to determine which statewas sent to him (a complete measurement is like a CPmap, but one in which we record each of the outcomes).The accessible information to Bob is given by the mutualinformation between his measurement and r1 , . . . ,rN(Holevo, 1973; Davies, 1976). This quantity tells us how

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

well Bob’s measurement can distinguish between themessage states and is given by

I~E :r!5H(i

2Tr~rEi!ln@Tr~rEi!#

1(j

pjTr~r jEi!ln@Tr~r jEi!#J .

The rationale behind this expression is that the uncer-tainty in the message before any measurement is per-formed is given by the first term, and the second termrepresents the uncertainty after the measurement hasidentified (partially in general) the message states. TheHolevo bound is an upper bound to the above accessibleinformation, i.e.,

SS (i

pir iD 2(i

piS~r i!>maxE

I~E :r!. (40)

This equality is saturated if and only if @r i ,r j#50 for alli and j . Therefore, since the Holevo bound is an upperbound to accessible information that Bob can gain aboutAlice’s message, we identify its maximum over all pos-sible initial probabilities with the classical capacity of aquantum channel.

The Holevo bound has an even more suggestive form:the uncertainty in the initial message is S(r), but afterthe states are correctly identified the average uncer-tainty is ( ipiS(r i). The difference between these twoquantities when maximized over all pis is the classicalcommunication capacity of a quantum channel. Notethat one of the most profound implications of theHolevo bound is that a quantum bit cannot store moreinformation than a classical bit. In spite of this limita-tion, quantum information processing is more efficientthan its classical analog. This is due to the different na-ture of information encoding, which is reflected in theexistence of superpositions of different states as well asentanglement between different qubits (see also the sec-tion on dense coding).

Proof of the Holevo bound in Eq. (40). The Holevobound is a direct consequence of the fact that the quan-tum relative entropy does not increase under CP mapsas in theorem 1. [Note that Holevo’s original proof ismuch more complicated and does not involve using thequantum relative entropy. Here I follow Yuen andOzawa in spirit (1993); for an alternative proof see Kingand Ruskai, 2001.] One such map is

t~A !51n

Tr~A !,

where A is any n3n positive matrix. This leads to thePeierls-Bogoliubov inequality (Bhatia, 1997);

t~A !@ ln t~A !2ln t~B !#<t~A ln A2A ln B !. (41)

To prove the Holevo bound I first use that fact that(theorem 5)

S~r iir!>(j

S~Ajr iAj†iAjrAj

†!.

Page 15: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

211V. Vedral: Relative entropy in quantum information theory

The Peierls-Bogoliubov inequality now implies that

S~Ajr iAj†iAjrAj

†!>Tr~Ajr iAj†!$ln@Tr~Ajr iAj

†!#

2ln@Tr~AjrAj†!#%

5p~ jui !@ ln p~ jui !2ln p~ j !# ,

where p(jui)5Tr$Ajr iAj†% is the conditional probability

that the message r i will lead to the outcome Ej5Aj†Aj

and p(j)5( ip(jui). Thus we now have

S~r iir!>(j

p~ jui !@ ln p~ jui !2ln p~ j !# .

Multiplying both sides by the (positive) pi and summingover all i leads to the Holevo bound. j

Since Holevo’s result is one of the key results in quan-tum information theory, I present another simple way ofunderstanding it via the quantum mutual information.This, of course, is only an additional motivation for theHolevo bound and by no means proves its validity.Namely, if Alice encodes the symbol (sym) i into thestate (st) r i , then the total state (sym1st) is

rsym1st5(i

piui&^iu ^ r i ,

where the kets ui& are orthogonal (we can think of theseas representing different states of consciousness of Al-ice!). Bob now wants to learn about the symbols by dis-tinguishing the states r i . He cannot learn more aboutthe symbols than is already stored in the correlationsbetween the symbols and the message states. This as weknow is given by the quantum mutual information

I~rsym1st!5S~sym!1S~st!2S~sym1st!

5SS (i

pir iD 2(i

piS~r i!, (42)

which is the same as the Holevo bound.I would now like to derive the capacity of a classical

communication channel from the Holevo bound. I fol-low the reasoning of Gorden (1964), who was, in fact,the first person to conjecture the Holevo bound. As Imentioned earlier, the Holevo bound itself contains theclassical capacity of a classical channel as a special case.This, as we might expect, happens when all r i’s are di-agonal in the same basis, i.e., they commute (classicallyall the states and observables commute because they canbe simultaneously specified and measured, which is incontrast with quantum mechanics). Therefore densitymatrices are reduced to classical probability distribu-tions. Let us call this basis the B representation, withorthonormal eigenvectors ub&. Then the probability thatthe measurement of the symbol represented by r i willyield the value b is just ^bur iub&. I call this the condi-tional probability, pi(b), that if r i was sent the result bwas obtained. Now the Holevo bound is

C5S~r!2(i

piS~r i!5S~r!2SB~r i!,

where SB(r i) is the conditional entropy given by

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

SB~r i!5(i

pi(b

^bur iub&ln^bur iub&5(i

piS~r i!.

Thus the Holevo bound reduces itself to the Shannonmutual information between the commuting messagesand the measurement in the B representation.

The usual rule of thumb for obtaining quantuminformation-theoretic quantities from their classicalcounterparts is by convention

( →trace,

( p~a !→rA ,

so that, for example, the Shannon entropy S@p(a)#52( ip(ai)ln p(ai) now becomes the von Neumann en-tropy S(rA)52TrrAlnrA .

Example. As the first application of the Holevo boundI shall compute the channel capacity of a bosonic field,e.g., an electromagnetic field (for an excellent review,see Caves and Drummond, 1994). The message informa-tion will now be encoded into modes of frequency v andaverage photon number m(v). The signal power is as-sumed to be S . The noise in the channel is quantified bythe average number of excitations n(v) and is assumedto be independent of the signal (i.e., the power of signaland noise is additive). We saw that when there is nonoise in the channel the Holevo bound is equal to theentropy of the average signal. In order to compute thecapacity we need to maximize this entropy with the con-straint that the total power (or energy) is fixed. It is wellknown that thermal states are those that maximize theentropy. We thus assume that both the noise and signal1noise are in thermal equilibrium and follow the usualBose-Einstein statistics. The noise power is

N5p~kT !2

12\.

The power of the output of the channel (signal1noise)is

P5S1N5p~kTe!2

12\,

where Te is the equilibrium temperature of signal1noise. Therefore it follows that

Te5~12\S/pk21T2!1/2.

The state of the noise in the mode v is

rN~v!5(n

12e2\v/kT

en(v)\v/kT un&^nu,

while the state of the output is

rN1S~v!5(n

12e2\v/kTe

en(v)\v/kTeun&^nu.

The capacity of the channel is given by the Holevobound, which is

Page 16: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

212 V. Vedral: Relative entropy in quantum information theory

C5E2`

`

$S@rS1N~v!#2S@rN~v!#%dv

5pkT

6\ln2$@12\S/p~kT !211#1/221%. (43)

The integration is there to take into account all themodes of the field. Let us look at the two extreme limitsof this capacity. In the high-temperature limit we obtainthe classical capacity

CC5S

kT ln 2, (44)

a result derived by Shannon and Weaver (1949). Thisstates that in order to communicate one bit of informa-tion with this setup we need exactly kT ln 2 amount ofenergy. In the low-temperature limit, on the other hand,quantum effects become important and the capacity be-comes independent of T :

CQ5Ap

ln 2 H S

\J 1/2

, (45)

a result which was derived by Stern (1960), Lebedev andLevitin (1963), Gordon (1964), and Yamamoto andHaus (1986), among others. Note also the appearance ofPlanck’s constant, which is a key feature of quantum me-chanics. If we wish to communicate one bit of informa-tion in this limit we need only \/p(ln 2);10234 J of en-ergy. This is significantly less than the correspondingenergy in the classical limit. Let us now compare theclassical and quantum capacity limits to the total energyof N harmonic oscillators (bosons) in the same two lim-its. In the high-temperature limit the equipartition theo-rem is applicable and the total energy is 3NkT (i.e., itdepends on temperature). In the low-temperature limitall the harmonic oscillators settle down to the groundstate so that the total energy becomes N\v/2 (i.e., it isindependent of temperature and we see the quantumdependence through Planck’s constant \).

B. Schumacher’s compression

The optimal communication through a noiseless chan-nel using pure states is equivalent to data compression.We saw in Eq. (3) that the limit to the classical datacompression is given by the entropy of the data’s prob-ability distribution. We would thus guess that the limit toquantum data compression is given by the von Neumannentropy of the set of states being compressed. This, infact, turns out to be a correct guess, as was first provenby Schumacher (1995). So, Alice now encodes letters ofher classical message into pure quantum states andsends these to Bob. For example, if a→uca& and b→ucb&, then Alice’s message aab will be sent to Bob asthe sequence of pure quantum states uca&uca&ucb& .

The exact problem can be phrased in the followingequivalent fashion: suppose a quantum source randomlyprepares different qubit states uc i& with the correspond-ing probabilities pi . A random sequence of n such states

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

is produced. By how much can this be compressed, i.e.,how many qubits do we really need to encode the origi-nal sequence (in the limit of large n)? First of all, thetotal density matrix is

r5(i

piuc i&^c iu.

This matrix can now be diagonalized as

r5(i

r iuri&^riu,

where ri and uri& are the eigenvectors and eigenvalues.This decomposition is, of course, indistinguishable fromthe original one (or any other decomposition for thatmatter). Thus we can think about compression in thisnew basis, which is easier as it behaves completely clas-sically (since ^riurj&5d ij). We can therefore invoke re-sults from the previous section on classical typical se-quences and conclude that the limit to compression isn(2( ir i ln ri), i.e., n qubits can be encoded into nS(r)qubits. No matter how the states are generated, as longas the total state is described by the same density matrixr its compression limit is its von Neumann entropy. Thisprotocol and result will be very important when we dis-cuss entanglement measures in the following section.

Example. Suppose that Alice encodes her bit intostates uC0&5 cos(u/2)u0&1sin(u/2)u1& and uC1&5sin(u/2)u0&1cos(u/2)u1& with p05p151/2 (see Fig. 4).Classically it is not possible to compress a source thatgenerates 0 and 1 with equal probability. Quantum me-chanically, however, compression can be achieved notonly by the nature of the probability distribution butalso due to the nonorthogonality of the states encodingsymbols of the message. In our example the overlapbetween the two states is ^c0uc1&5sin u and they areorthogonal only when u5p , in which case no compres-sion is possible. Otherwise, the compression ratio is di-

FIG. 4. Two nonorthogonal states on the Bloch sphere that areused to encode a message. The overlap between them is sin u;the smaller the overlap, the more the total message can becompressed. In terms of information, the less distinguishablethe states (i.e., the smaller the overlap), the less informationthey carry.

Page 17: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

213V. Vedral: Relative entropy in quantum information theory

rectly proportional to the overlap between the states.Suppose Alice’s messages are only three qubits long.Then there are eight different possibilities,uC0C0C0& , . . . ,uC1C1C1&, which are all equally likelywith 1/8 probability. In general these states will lie with ahigh probability within a subspace of the eight-dimensional Hilbert space. Let us call this likely sub-space a ‘‘typical’’ subspace. Its orthogonal complementwill be unlikely and hence called an ‘‘atypical’’ subspace.In order to find the typical and atypical subspaces weneed to diagonalize the average signal,

r512

~ uC0&^C0u1uC1&^C1u!.

Its diagonal form is

r512

~11sin u!u1&^1u112

~12sin u!u2&^2u,

where u6&5u0&6u1&. Now we look at the probabilitiesthat each of the eight messages will lie along the neworthogonal basis u111&, . . . ,u222& of the Hilbertspace of three qubits:

u^111uc ^ 3&u25@cos~u/2!1sin~u/2!#6,

u^112uc ^ 3&u25@cos~u/2!1sin~u/2!#4

1@cos~u/2!2sin~u/2!#2,

u^122uc ^ 3&u25@cos~u/2!1sin~u/2!#2

1@cos~u/2!2sin~u/2!#4,

u^222uc ^ 3&u25@cos~u/2!2sin~u/2!#6,

where uc ^ 3& represents any three-qubit sequence of uc0&and uc1&. In addition, all the probabilities foru112&,u121& ,u211& are equal and so are the prob-abilities for u122&,u221&,u212&. Thus the aboveequation contains 64 probabilities in total. Suppose nowthat cos(u/2);sin(u/2). Then we see that the states con-taining two or more 1 become much more likely. Thismeans that the message states are much more likely tobe in this particular subspace. Therefore the compres-sion would be as follows. First the source generatesthree qubits in some state. Then we project this messageonto the typical subspace. If we are successful, this willlie in that four-dimensional typical subspace for whichwe need only two qubits rather than three. Otherwise,our projection will fail and the message will end up inthe atypical subspace, in which case Alice does not com-press it. The probability of ending up in the atypicalspace asymptotically goes to zero (the law of large num-bers). Therefore in this example the limit to our com-pression is given by 2@1/2(11sin u)#ln@1/2(11sin u)#2@1/2(12sin u)#ln@1/2(12sin u)#, which is of course thevon Neumann entropy of r. The number of dimensionsof the total Hilbert space’s typical subspace is likewise ingeneral equal to enS(r).

Interestingly, if instead of pure states a quantumsource generates mixed states r i with probabilities pi ,then the best compression limit is in general unknown.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

We can, of course, use the above protocol to compressthe sequence to the von Neumann entropy of the aver-age signal, S(( ipir i). However, in some cases it isknown that a better compression can be achieved. Thelower bound to compression is the Holevo bound,S(( ipir i)2( ipiS(r i), but it is not known whether thisbound can in general be attained (see Horodecki, 1998).

Next we look at a protocol for classical communica-tion that involves entanglement. At first sight this proto-col seems to violate the Holevo bound on classical com-munication, i.e., that it is possible to communicate onlyone bit per single qubit. However, a closer inspectionwill show that this is not the case.

C. Dense coding

Now let us consider the case of dense coding, whichwas introduced by Bennett and Wiesner (1992). In thisprotocol entanglement plays a crucial role, and this willgive us a first indication of the fact that entanglementcan be quantified like any other resource, such as energy,for example. Alice and Bob initially share an entangledpair of qubits in some state W0 , which may be mixed.Alice then performs local unitary operations on her qu-bit to put this shared pair of qubits into any of the statesW0 , W1 , W2 , or W3 . In general, Alice may use a com-pletely arbitrary set of unitary operations to generatethese states:

Wi5Ui^ IW0Ui†

^ I, (46)

and the number of generated states is completely arbi-trary. In the above equation, Ui acts on Alice’s qubit andI acts on Bob’s qubit. By sending her encoded qubit toBob, Alice is essentially communicating with Bob usingthe states W0 , W1 , W2 , and W3 as separate letters. Thenumber of bits she can communicate to Bob using thisprocedure is thus bounded by the Holevo bound. More-over, if some block coding is done on a large enoughcollection of qubits in addition to the dense coding, thenthe number of bits of information communicated isequal to the Holevo function. We shall thus take

C5S~r!2(i

piS~r i!, (47)

assuming that any additional necessary block coding willautomatically be performed to supplement the densecoding. This coding is essential in order to achieve thecapacity given by the Holevo bound in the asymptoticlimit. [The fact that the bound is achievable follows froma complicated argument and cannot really be derivedusing the arguments presented in this review. Hausladenet al. (1996) have proved this for pure states, and Schu-macher and Westmoreland (1997) and independentlyHolevo (1998) have proved it for mixed states.] Exactlythe same assumption has been used by Hausladen et al.(1996) to calculate the capacity for dense coding in thecase of pure letter states. Equations (46) and (47) definethe most general version of dense coding, and I shallrefer to this as completely general dense coding.

Page 18: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

214 V. Vedral: Relative entropy in quantum information theory

A simpler example of dense coding is the case inwhich the letter states are generated from the initialshared state W0 by

W05I^ IW0I^ I, (48)

W15s1^ IW0s1^ I, (49)

W25s2^ IW0s2^ I, (50)

W35s3^ IW0s3^ I. (51)

In the above set of equations, the first operator of thecombination s i^ I acts on Alice’s qubit and the secondoperator acts on Bob’s qubit. I shall refer to this case[i.e., in which the letter states are generated by Eqs.(48)–(51)] as simply general dense coding. The general-ity present in general dense coding is that Alice is al-lowed to prepare the different letter states with unequalprobabilities.

In the more special case in which Alice not only gen-erates the four letter states according to Eqs. (48)–(51)but also does so with equal probability, the ensemble isgiven by

W514 (

i50

3

Wi (52)

and the capacity becomes

C514 (

i50

3

S~WiiW !. (53)

I shall call this simplest case special dense coding.Among all the possible ways of doing general dense cod-ing, special dense coding is the optimal way to commu-nicate when W0 is a pure state (Bose, Plenio, and Ve-dral, 2000) or a Bell diagonal state.

Now I derive the most general bound on completelygeneral dense coding (Bowen, 2001). Furthermore, thisbound can be attained by the same protocol as specialdense coding (Bowen, 2001). The proof is achieved byfirst finding an upper bound to the capacity for com-pletely general dense coding and then showing that spe-cial dense coding actually saturates this bound. Supposethat the initial state of Alice and Bob is rAB . Then wehave

C5max SS (k

pk~Uk^ I !rAB@~Uk!†

^ I# D2(

kpkS$~Uk

^ I !rAB@~Uk!†^ I#%

5max S~rAB8 !2S~rAB!<S~rA8 !1S~rB8 !2S~rAB!

<11S~rB!2S~rAB!. (54)

Since this bound is achievable as shown by Bowen(2001), the capacity for completely general dense codingis given by Eq. (54).

I shall now restrict my attention to a calculation of Cfor pure letter states. Consider the initial shared purestate W0 to be

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

uc0&5~au00&1bu11&). (55)

Then, according to Eqs. (48)–(51), the other letter statesare given by

uc1&5~au10&1bu01&), (56)

uc2&52i~au10&2bu01&), (57)

uc3&5~au00&2bu11&), (58)

from which we obtain Wi5uc i&^c iu. As all Wi are purestates we have

S~Wi!50. (59)

Thus we have

C5S~W !. (60)

I shall consider only the case of special dense coding asit is optimal. Thus the ensemble used is obtained fromEq. (52) to be

W5uau2

2u00&^00u1

ubu2

2u01&^01u1

uau2

2u10&^10u

1ubu2

2u11&^11u.

Thus from Eq. (60) for the capacity C we get

C52S uau2 loguau2

21ubu2 log

ubu2

2 D512~ uau2 loguau21ubu2 logubu2!. (61)

[Note that this agrees with Eq. (54), as for pure statesthe total entropy is zero.] This implies that a good mea-sure of entanglement for a pure state of a system com-posed of two subsystems A and B can be given by thevon Neumann entropy of the state of either of the sub-systems. Let us call this measure the von Neumann en-tropy of entanglement and label it by Ev (Bennett,Bernstein, et al. 1996; Popescu and Rohrlich, 1997).Thus

Ev~ uc&^cuA1B!5S@TrA~ uc&^cuA1B!# ,

FIG. 5. The dependence of capacity for dense coding for purestates au00&1bu11& as a function of the Schmidt coefficient x5uau2. When the state is disentangled, i.e., when either a50 orb50, the capacity becomes 1 bit per qubit, the same as theclassical capacity.

Page 19: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

215V. Vedral: Relative entropy in quantum information theory

where TrA stands for partial trace over states of systemA. Therefore, for all the states Wi ,

Ev~Wi!52~ uau2 log uau21ubu2 log ubu2!.

Thus

C511Ev~Wi!

(see Fig. 5). We can prove that for pure states, specialdense coding (using all alphabet states with equal a pri-ori probability) is the optimal way to communicateamong all possible ways of doing general dense coding[i.e., when the letter states are generated by Eqs. (48)–(51); Bose, Plenio, and Vedral, 2000]. It is important tounderstand that the amount of entanglement determinesexactly how much information Alice can convey to Bob.Note that if there is no entanglement shared betweenthem, then the amount of information is exactly one bitper Alice’s qubit (which is what can be achieved classi-cally after all). At the other extreme, when they share amaximally entangled state, the amount of information istwo bits per Alice’s qubit. This is an amount that nopurely classical communication can achieve. However,while the von Neumann entropy is a good measure ofentanglement for pure states (in fact, there are argu-ments that it is unique for pure states; Popescu andRohrlich, 1997), it fails when we try to apply it to mixedstates. A possibility is to follow the logic of the pure-state dense coding and call S(rB)2S(rAB) a measure ofentanglement for mixed states as in Eq. (54). This mea-sure has been called the ‘‘coherent information’’ and isused to describe information transmission through anoisy quantum channel (Barnum et al., 1998). But is thismeasure consistent with other natural requirements forquantifying entanglement? This question will be ad-dressed in the next section. Before this, we show that inorder to delete a certain amount of correlation we needto increase the entropy of the environment by at leastthis amount. This is known as Landauer’s erasure (Lan-dauer, 1961; Toffoli, 1981; Bennett, 1988) and is seen tobe linked directly to the relative entropy.

D. Relative entropy, thermodynamics, and informationerasure

We have seen that communication essentially createscorrelations between the sender and the receiver. Creat-ing correlations is therefore very important in order tobe able to convey any information. However, I shouldnow like to talk about the opposite process—deletingcorrelations. Why would one want to do this? The rea-son is that one might want to correlate one system withanother and might need to delete all its previous corre-lations to be able to store new ones. I should like to givea more physical statement of information erasure andlink it to the notion of measurement. I shall thereforeintroduce two correlated parties—a system and an appa-ratus. The apparatus will interact with the system,thereby gaining a certain amount of information about it(the full quantum description of this process will be pre-sented in Sec. V). Suppose that the apparatus needs to

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

measure another system. We first need to delete infor-mation about the last system before we can make an-other measurement. The most general way of conduct-ing erasure (resetting) of the apparatus is by employinga reservoir in thermal equilibrium at a certain tempera-ture T . To erase the state of the apparatus we just throwit into the reservoir and introduce a new pure state. Theentropy increase of the operation now consists of twoparts. First, the state of the apparatus evolves to thestate of the reservoir, and this entropy is added to thereservoir’s entropy. Second, the rest of the reservoirchanges its entropy due to this interaction, which is thedifference in the apparatus’s internal energy before andafter the resetting (no work is done in this process). Thisquantum approach to equilibrium was also studied byPartovi (1989). A good model is obtained by imaginingthat the reservoir consists of a great number of systems(of the same ‘‘size’’ as the apparatus) all in the samequantum equilibrium state v. Then the apparatus, whichis in some state r, interacts with these reservoir systemsone at a time. Each time there is an interaction, the stateof the apparatus approaches more closely the state ofthe reservoir, while that single reservoir system alsochanges its state away from equilibrium. However, thesystems in the bath are numerous, so that after a certainnumber of collisions the state of the apparatus will ap-proach the state of the reservoir, while the reservoir willnot change much since it is very large (this is equivalentto the Born-Markov approximation which leads to irre-versible dynamics of the apparatus described here).

Bearing all this in mind, we now reset the apparatusby plunging it into a reservoir in thermal equilibrium (aGibbs state) at temperature T . Let the state of the res-ervoir be

v5e2bH

Z5(

jqju« j&^« ju,

where H5( i« iu« i&^« iu is the Hamiltonian of the reser-voir, Z5Tr(e2bH) is the partition function, and b21

5kT , where k is the Boltzmann constant. Now supposethat due to the measurement the entropy of the appara-tus is S(r) [and an amount S(r) of information hasbeen gained], where r5( ir iuri&^riu is the eigen expan-sion of the apparatus state. The total increase of entropyin the erasure (there are two parts as I argued above:change in the entropy of the apparatus and change inthe entropy of the reservoir) is

DSer5DSapp1DSres .

We immediately know that DSapp5S(v), since the stateof the apparatus (no matter what state it was before) isnow erased to become the same as that of the reservoir.However, the entropy change in the reservoir is the av-erage over all states uri& of heat received by the reservoirdivided by the temperature. This is minus the heat re-ceived by the apparatus divided by the temperature; theheat received by the apparatus is the internal energyafter the resetting minus the initial internal energy^riuHuri&. Thus

Page 20: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

216 V. Vedral: Relative entropy in quantum information theory

DSres52(k

rk

Tr~vH !2^rkuHurk&T

5(k

S rk(j

u^rku« j&u22qkD ~2log qk2log Z !

52Tr~r2v!~ log v2log Z !5Tr~v2r!log v .

Altogether we have an exact expression for the increasein entropy due to deletion:

DSer52Tr~rlog v!.

This result (Vedral, 2000) generalizes Lubkin’s result,which applies only when @r ,v#50. In general, however,the information gain is equal to S(r), the entropy in-crease in the apparatus. This entropy increase is a maxi-mum; the information between the system and appara-tus is usually smaller, as in Eq. (42). Thus we see that

DSer52Tr~rlog v!>S~r!5I ,

and Landauer’s principle is confirmed [the inequality fol-lows from the fact that the quantum relative entropyS(riv)52Tr(rlog v)2S(r) is non-negative]. So theerasure is the least wasteful when v5r , in which casethe entropy of erasure is equal to S(r), the informationgain. This is when the reservoir is in the same state asthe state of the apparatus we are trying to erase. In thiscase we just have a state swap between the new purestate of the apparatus and the old state r which it re-places. Curiously enough, creating correlations is notcostly in terms of the entropy of environment (such aswhen Alice and Bob communicate).

Landauer’s erasure is a statement that is equivalent tothe second law of thermodynamics. If we could deleteinformation without increasing entropy, then we couldconstruct a machine that completely converts heat intowork with no other effect, which contradicts the secondlaw. The opposite is also true. Namely, if we could con-vert heat into work with no other effect, then we coulduse this energy to delete information with no entropyincrease (Landauer, 1961; Penrose, 1973; Toffoli, 1981;Bennett, 1988). Thus the relative entropy provides aninteresting link between thermodynamics, informationtheory, and quantum mechanics [see also Brillouin’s ex-cellent book (Brillouin, 1956)].

I shall now show how Landauer’s principle can beused to derive a limit to quantum data compression. Thefree energy lost in deleting information stored in a stringof n qubits all is the state r is nb21S(r). However, wecould first compress this string and then delete the re-sulting information. The free-energy loss after compres-sion is mb21 log 25mb21, where the string has beencompressed to m qubits. The two free energies beforeand after compression should be equal if no informationis lost during compression, i.e., if we wish to have maxi-mal efficiency, and therefore m/n5S(r) as shown pre-viously (see Feynman, 1996). The equality is, of course,only achieved asymptotically.

So far we have seen that entropy plays a pivotal rolein communication theory and data compression as a

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

limit to both communication capacity and compression.It also quantifies the amount of entanglement in a purebipartite state. Finally, it plays a thermodynamical rolein characterizing the mixedness in a certain quantumstate. This last role was first introduced by von Neu-mann. Now we go beyond the classical use of quantumstates and address the question of how we can achievequantum communication of quantum states.

IV. QUANTUM COMMUNICATION: QUANTUM USE

In this section the problem of entanglement quantifi-cation is analyzed. Previously we have seen that the re-duced von Neumann entropy is a good measure of en-tanglement for two subsystems in a joint pure state (seealso Bennett, Bernstein, et al., 1996). This is a conse-quence of the Schmidt decomposition procedure intro-duced earlier and was exemplified by dense coding.However, for the mixed states of two subsystems, or formore than two subsystems, this procedure does not existin general. Therefore it is not immediately clear how tounderstand and quantify correlations for these states.Initially we might think that Bell’s inequalities (Clauseret al., 1969; Bell, 1987; Redhead, 1987) would provide agood criterion for separating quantum correlations (en-tanglement) from classical correlations in a given quan-tum state. States that violate Bell’s inequalities would beentangled and other states would be disentangled. How-ever, while it is true that a violation of Bell’s inequalitiesis a signature of quantum correlation, not all entangledstates violate Bell’s inequalities (Gisin, 1996). Thereforein order to completely separate quantum from classicalcorrelations we need a different criterion.

I shall present here an approach that has proven to bevery fruitful in understanding entanglement in general.It begins with a set of conditions that any reasonablemeasure of entanglement has to satisfy. I then discusspossible candidates based on these criteria.

A. Quantifying entanglement

In this section I shall mainly focus on understandingthe entanglement of bipartite systems, i.e., systems con-taining only two subsystems. The term entanglement, orversrankung as it was originally called, was introducedby Schrodinger (1935) to emphasize the bizarre implica-tions of quantum mechanics. The reason for studyingbipartite entanglement is that it is the simplest and mostbasic kind of entanglement and is well understood atpresent. Starting from bipartite entanglement we canconstruct a theory that can be generalized to any num-ber of systems.

To determine the basic properties that every ‘‘good’’entanglement measure should satisfy (Vedral, Plenio,Rippin, et al., 1997; Vedral and Plenio, 1998), we have todiscuss what we actually mean when we say that some-thing is ‘‘disentangled.’’ By definition a bipartite state isdisentangled if it can be written in the separable formrAB5( ipir i

A^ r i

B (Werner, 1989). It is clear why wechoose to define disentangled states in this manner:

Page 21: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

217V. Vedral: Relative entropy in quantum information theory

these are the most general states that Alice and Bob cancreate by local operations and classical communication(LOCC). Thus these states contain no entanglement, asentanglement can be created only through global opera-tions. All other states will be entangled to some degree.In addition, note that the set of all disentangled states isconvex: a convex combination (mixture) of any two dis-entangled states is itself disentangled. This fact will beimportant when we quantify entanglement later.

The first question to answer is the following: Whencan a given matrix be written in a separable form? Thenecessary and sufficient condition is known in general interms of positive (but not necessarily completely posi-tive) maps (Horodecki et al., 1996; Peres, 1996). Supposethat L is any positive map; then

IA ^ LBS (i

pir iA

^ r iBD 5(

ipir i

A^ LB~r i

B! (62)

is always a positive operator. Remarkably, the converseis also true. If, for all positive maps L, the state IA^ LB(rAB) is positive, then rAB is separable (disen-tangled). Therefore, if we want to know whether a givenstate rAB is entangled, we need to find a positive mapwhose action on B will result in a negative operator andhence not a physical state (Horodecki, 2001a). This con-dition is still not operational, since there is an infinitenumber of positive maps to search. In fact, there is nooperational condition in general, but it exists only insome special cases. For example, for two qubits or aqubit and a qutrit (a three-level system), this conditionsimplifies to the following (Horodecki et al., 1996; Peres,1996): such a state is entangled if and only if a transpo-sition of B results in a negative operator, i.e., rAB

TB ,0.The relationship between positive maps and entangle-ment is a very active field of research and I refer theinterested reader to some papers investigating this issue:Bennett, DiVincenzo, Mor, et al. (1999); DiVincenzoet al. (2000); Kraus et al. (2000); Lewenstein et al. (2000).With this in mind, let us turn to quantifying entangle-ment.

The first property we need from an entanglementmeasure is that a disentangled state not have any quan-tum correlations. This gives rise to our first condition.

(E1) For any separable state s the measure of en-tanglement should be zero, i.e.,

E~s!50. (63)

Note that we do not ask the converse, i.e., that ifE(s)50, then s is separable. The reason for thiswill become clear below.

The next condition concerns the behavior of theentanglement under simple local unitary trans-formations. A local unitary transformation sim-ply represents a change of the basis in which weconsider the given entangled state. But a changeof basis should not change the amount of en-tanglement that is accessible to us, because at

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

any time we could just reverse the basis change(since unitary transformations are fully revers-ible).

(E2) For any state s and any local unitary transforma-tion, i.e., a unitary transformation of the formUA ^ UB , the entanglement remains unchanged.Therefore

E~s!5E~UA^UBsUA†

^UB† !. (64)

The third condition is the one that really restrictsthe class of possible entanglement measures. Un-fortunately it is also the property that is usuallythe most difficult to prove for potential measuresof correlation between two subsystems shouldincrease under local operations on the sub-systems separately. However, quantum entangle-ment is even more restrictive in that the totalamount of entanglement cannot increase locallyeven with the aid of classical communication.Classical correlations, on the other hand, can beincreased by the use of local operations and clas-sical communication.

Example. Suppose that Alice and Bob share nuncorrelated pairs of qubits, for example, all inthe state u0&. Alice’s computer then interactswith each of her qubits such that it randomlyflips each qubit with probability 1/2. However,whenever a qubit is flipped, Alice’s computer(classically) calls Bob’s computer and informs itto do likewise. After this action on all the qubits,Alice and Bob end up sharing n (maximally) cor-related qubits in the state u00&^00u1u11&^11u, i.e.,whenever Alice’s qubit is zero so is Bob’s andwhenever Alice’s qubit is one so is Bob’s. Thestate of each pair is mixed because Alice andBob do not know whether their computersflipped their respective qubits or not.

We can always calculate the total amount of en-tanglement by summing up the entanglement ofall systems after we have applied our local op-erations and classical communications.

(E3) Local operations, classical communication, andsubselection cannot increase the expected en-tanglement, i.e., if we start with an ensemble instate s and end up with probability pi in suben-sembles in state s i then we shall have

E~s!>(i

piE~si!, (65)

where s i5Ai ^ BisAi†

^ Bi†/pi and pi5Tr(Ai

^ BisAi†

^ Bi†). The form A ^ B shows that Al-

ice and Bob perform their operation locally (i.e.,Alice cannot affect Bob’s system and vice versa).However, Alice’s and Bob’s operations can becorrelated, as is manifested in the fact that theyhave the same index. It should be pointed outthat although all the local operations and classi-

Page 22: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

218 V. Vedral: Relative entropy in quantum information theory

cal communication can be cast in the aboveproduct form, the opposite is not true: not all theoperations of the product form can be executedlocally (Bennett, Divincenzo, Fuchs, et al., 1999).This means that the above condition is more re-strictive than necessary, but this does not haveany significant consequences as far as I amaware. An example of (E3) operation is the localaddition of particles on Alice’s and Bob’s side.Note also that (E2) operations are a subset (spe-cial case) of (E3) operations.

The last condition is there to make sure that ourmeasure is consistent with pure states.

(E4) Entanglement of a pure state is equal to the re-duced von Neumann entropy.

The above conditions are natural and easy to under-stand physically. However, they can be reduced to sim-pler and more elementary conditions, which I nowbriefly discuss. Suppose that we ask that the measure ofentanglement be

(1) weakly additive, i.e., E(r ^ r)52E(r);(2) continuous, i.e., if r is close to s, then E(r) is close

to E(s).

Then, it can be shown (Popescu and Rohrlich, 1997;Vidal, 2000) that (E4) is a consequence of the weak ad-ditivity and continuity (providing we assume that theentanglement of a maximally entangled state is normal-ized to log 2). In addition, in (E3) we use the most gen-eral local POVMs, but we know that these can be imple-mented by adding ancillas locally, performing a unitarytransformation on the system and ancilla locally, andthen tracing out the ancillas. So, (E2)–(E4) can be pre-sented in a more elementary way, as was done by Vidal(2000). However, I chose to introduce entanglementmeasures via (E1)–(E4) as I think that they are moreintuitive and capture the main ideas. Readers interestedin further analysis of these conditions are advised toread Vidal (2000) and Horodecki et al. (2000).

Before I introduce different entanglement measures Ishould like to discuss the following question: What dowe mean by saying that a state s can be converted intoanother state r by local operations and classical commu-nication? Strictly speaking, we mean that there exists anLOCC procedure that, given a sufficiently large numberof copies n of s, will convert them arbitrarily close to mcopies of the state r; i.e.,

~;e.0 !~;mPN !~'nPN ;'FPLOCC!

3iF~s ^ n!2r ^ mi,e , (66)

where is2ri is some measure of distance (metric) onthe set of density matrices. Now, if s is more entangledthan r, we expect that there is an LOCC procedure suchthat m.n ; otherwise, we expect that we can have n<m . Measuring entanglement now reduces to finding anappropriate function on the set of states to order them

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

according to their local convertibility. This is usuallyachieved by letting either s or r be a maximally en-tangled state.

Entanglement of formation. I now introduce three dif-ferent measures of entanglement, all of which obey(E1)–(E4). First I discuss the entanglement of formationof Bennett et al. (Bennett, Divincenzo, et al., 1996). Wedefine the entanglement of the formation of a state r by

EF~r!ªmin (i

piS~rAi !, (67)

where S(rA)52TrrA ln rA is the von Neumann entropyand the minimum is taken over all the possible real-izations of the state, rAB5( jpjuc j&^c ju with rA

i

5TrB(uc i&^c iu). This measure satisfies (E1)–(E4). Thebasis of formation is that Alice and Bob would like tocreate an ensemble of n copies of the nonmaximally en-tangled state, rAB , using only local operations, classicalcommunication, and a number m of maximally en-tangled pairs (see Fig. 6). Entanglement of formation isthe asymptotic conversion ratio, m/n , in the limit of in-finitely many copies. The form of this measure given inEq. (67) will be more transparent after the next subsec-tion and the relationship between the entanglement offormation and other proposed measures will be analyzedin more detail below. It is worth mentioning that aclosed form for this measure exists for two qubits(Wootters, 1998).

Related to this measure is the entanglement of distil-lation, also introduced by Bennett, Divincenzo, et al.(1996).

Entanglement of distillation. This measure defines theamount of entanglement of a state s as the asymptoticproportion of singlets that can be distilled using a puri-fication procedure (for a rigorous definition see Rains,

FIG. 6. The formation of entangled states: a certain number ofmaximally entangled pairs is manipulated by local operationsand classical communication and converted into pairs in somestate r. The asymptotic conversion ratio is known as the en-tanglement of formation. The converse of formation is distilla-tion of entanglement. The asymptotic rate of converting pairsin state r into maximally entangled states is known as the en-tanglement of distillation. The two measures of entanglementare in general different, distillation being greater than or equalto formation. This surprising irreversibility of entanglementconversion is explained in the text as a consequence of the lossof classical information about the decomposition of r.

Page 23: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

219V. Vedral: Relative entropy in quantum information theory

1999a, 1999b). This is the opposite process to that lead-ing to the entanglement of formation (Fig. 6), althoughits value is generally smaller, implying that the formationof states is in some sense irreversible. The reason for thisirreversibility will be explained in the next subsection.This measure fails to satisfy the converse of (E1),namely, for all disentangled states the entanglement ofdistillation is zero, but the converse is not true. There doexist states that are entangled, but from which no en-tanglement can be distilled. For this reason they arecalled bound entangled states (Horodecki et al., 1998; seealso DiVincenzo et al., 2000). This is why the condition(E1) is not stated to be both necessary and sufficient.

Relative entropy of entanglement. I now introduce thefinal measure of entanglement, which was first proposedby Vedral, Plenio, Rippin, and Knight (1997). This mea-sure is intimately related to the entanglement of distilla-tion by providing an upper bound for it. If D is the set ofall disentangled states, the measure of entanglement fora state s is then defined as

E~s!ªminrPD

S~sir!, (68)

where S(sir) is the quantum relative entropy. Thismeasure, which I shall call the relative entropy of en-tanglement, tells us that the amount of entanglement ins is its distance from the disentangled set of states. Instatistical terms, as introduced in Sec. II, the more en-tangled a state is, the more it is distinguishable from adisentangled state (Vedral, Plenio, Jacobs, and Knight,1997). To better understand all three measures of en-tanglement we need to introduce another quantum pro-tocol that relies fundamentally on entanglement.

Another condition that might be considered intuitivefor a measure of entanglement is convexity. That is, wemight require that

ES (i

pisiD<(

ipiE~s i!.

This states that mixing cannot increase entanglement.For example, an equal mixture of two maximally en-tangled states u00&1u11& and u00&2u11& is a separablestate and consequently contains no entanglement. I didnot include convexity as a separate requirement for anentanglement measure as it is not completely indepen-dent from (E3). This is because (E3) and the strong ad-ditivity @E(r ^ s)5E(r)1E(s)# imply convexity,

n(i

piE~r i!5E~r1^ p1n

r2^ p2n

¯ rN^ pNn

!

>EF S (i

pir iD ^ nG5nES (i

pir iD ,

where the equalities follow from the strong additivityassumption and the inequality is a consequence of (E3).The symbol r ^ m means that we have m copies of thestate r. Nevertheless, it is interesting to point out thatany convex measure that satisfies continuity and weakadditivity has to be bounded from below by the en-tanglement of distillation and from above by the en-

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

tanglement of formation (Horodecki et al., 2000). Weshall see that most entanglement measures can in fact begenerated using the quantum relative entropy.

It is interesting to note that the relative entropy ofentanglement does in fact satisfy both convexity andcontinuity (Donald and Horodecki, 1999) although notadditivity (Vollbrecht and Werner, 2001). Furthermore,we can easily show that it is an upper bound to the en-tanglement of distillation. For any pure state uc&,minvPD S(c ^ niv)5minvPD2^c ^ nulog vuc^n&. But thelogarithmic function is concave, so that

minvPD

2^c ^ nulog vuc ^ n&>minvPD

2log^c ^ nuvuc ^ n& .

However, according to the recent result of Horodecki(Horodecki et al., 1996), since v is a disentangled state,its fidelity with the maximally entangled state cannot belarger than the inverse of the half dimension of thatstate, so that ^c ^ nuvuc ^ n&<1/2n. Thus

minvPD

S~c ^ niv!>2log~1/2n!5n . (69)

But we know that this minimum is achievable by thestate v5r ^ n, where r is obtained from c by removingthe off-diagonal elements in the Schmidt basis. Conse-quently, if we are starting with n copies of state s andobtaining m copies of c by local operations and classicalcommunication, then

D5m

n5

1n

minvPD

S~c ^ miv!<1n

minvPD

S~s ^ niv!,

where the equality follows from Eq. (69) and the in-equality from the fact that the relative entropy is nonin-creasing under LOCC [strictly speaking, D5limn→`(m/n) and, of course, m is a function of n , m5m(n)]. Thus the distillable entanglement is boundedfrom above by the relative entropy of entanglement.

A similar argument can be given to show that the rela-tive entropy of entanglement is bounded from above bythe entanglement of formation (Vedral and Plenio,1998). Since most of the measures of entanglement canbe derived from the relative entropy they will possesssimilar properties. In order to see this, we first need tointroduce quantum teleportation.

B. Teleportation

Let us begin by describing quantum teleportation inthe form originally proposed by Bennett et al. (1993).Suppose that Alice and Bob, who are distant from eachother, wish to implement a teleportation procedure. Ini-tially they need to share a maximally entangled pair ofqubits. This means that if Alice and Bob each have onequbit, then the joint state may, for example, be

uCAB&5~ u0A&u0B&1u1A&u1B&)/A2, (70)

where the first ket (with subscript A) belongs to Aliceand second (with subscript B) to Bob. Note that thisstate is maximally entangled and is different from a sta-

Page 24: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

220 V. Vedral: Relative entropy in quantum information theory

tistical mixture (u00&^00u1u11&^11u)/2, which is the mostcorrelated state allowed by classical physics.

Now suppose that Alice receives a qubit in an un-known state uF&5au0&1bu1& and she wants to teleportit to Bob. The state has to be unknown to her becauseotherwise she can just phone Bob up and tell him all thedetails of the state, and he can then recreate it on aparticle that he possesses. Given that Alice does notknow the state, she cannot measure it to obtain all thenecessary information to specify it. If she could, thiswould lead to a violation of the uncertainty principle.Therefore she has to resort to using the state uCAB& thatshe shares with Bob to transfer her state to him withoutactually learning this state. This procedure is what wemean by quantum teleportation.

I first write out the total state of all three qubits,

uFAB&ªuF&uCAB&5~au0&1bu1&)~ u00&1u11&)/A2.

However, the above state can be conveniently written ina different basis,

uFAB&5~au000&1au011&1bu100&1bu111&)/&

512

@ uF1&~au0&1bu1&)1uF2&~au0&2bu1&)

1uC1&~au1&1bu0&)1uC2&~au1&2bu0&)],

where

uF1&5~ u00&1u11&)/& , (71)

uF2&5~ u00&2u11&)/& , (72)

uC1&5~ u01&1u10&)/& , (73)

uC2&5~ u01&2u10&)/& , (74)

form an orthonormal basis of Alice’s two qubits (re-member that the first two qubits belong to Alice and thelast qubit belongs to Bob). The above basis is frequentlycalled the Bell basis. This is a very useful way of writingthe state of Alice’s two qubits and Bob’s single qubitbecause it displays a high degree of correlation betweenAlice’s and Bob’s parts: for every state of Alice’s twoqubits (i.e., uF1& ,uF2& ,uC1&,uC2&) there is a corre-sponding state of Bob’s qubit. In addition, the state ofBob’s qubit in all four cases looks very much like theoriginal qubit that Alice has to teleport to Bob. It is nowstraightforward to see how to proceed with the telepor-tation protocol (Bennett et al., 1993).

(1) Upon receiving the unknown qubit in state uF& Al-ice performs projective measurements on her twoqubits in the Bell basis. This means that she willobtain one of the four Bell states randomly and withequal probability.

(2) Suppose Alice obtains the state uC1&. Then thestate of all three qubits (Alice1Bob) collapses tothe following state:

uC1&~au1&1bu0&)

(the last qubit belongs to Bob as usual). Alice nowhas to communicate the result of her measurement

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

to Bob (over the phone, for example). The point ofthis communication is to inform Bob how the stateof his qubit now differs from the state of the qubitAlice was holding before the Bell measurement.

(3) Now Bob has to apply a unitary transformation onhis qubit that simulates a logical NOT operation:u0&→u1& and u1&→u0&. He thereby transforms thestate of his qubit into the state au0&1bu1&, which isprecisely the state that Alice had to teleport to himinitially. This completes the protocol. It is easy to seethat if Alice obtained some other Bell state, thenBob would have to apply some other simple opera-tion to complete the teleportation. They can be rep-resented by the Pauli spin matrices.

An important fact to observe in the above protocol isthat all the operations (Alice’s measurements and Bob’sunitary transformations) are local in nature. This meansthat there is never any need to perform a (global) trans-formation or measurement on all three qubits simulta-neously, which is what allows us to call the above proto-col a genuine teleportation. It is also important that theoperations that Bob performs are independent of thestate that Alice tries to teleport to him. Note also thatthe classical communication from Alice to Bob in step(2) above is crucial because otherwise the protocolwould be impossible to execute. (There is a deeper rea-son for this: if we could perform teleportation withoutclassical communication, then Alice could send mes-sages to Bob faster than the speed of light; see, for ex-ample, Vedral, Rippin, and Plenio, 1997.)

It is important to observe that the initial state to beteleported is destroyed immediately after Alice’s mea-surement, i.e., it becomes maximally mixed of the form(u0&^0u1u1&^1u)/2. This has to happen since otherwiseAlice and Bob would end up with two qubits in the samestate, effectively cloning an unknown quantum state,which is impossible by the laws of quantum mechanics.This is the no-cloning theorem of Wootters and Zurek(1982), which is a simple consequence of the linearity ofquantum dynamical laws. We also see that at the end ofthe protocol the quantum entanglement of uCAB& iscompletely destroyed. Does this have to be the case ingeneral or might we save that state at the end (perhapsby performing a different teleportation protocol)? Theanswer is yes, the state must be destroyed (Plenio andVedral, 1998), because if this were not the case, thenentanglement could increase under local operations andclassical communication, which as we have seen is pro-hibited by definition.

Teleportation has been experimentally performed inthree different setups (Bouwmeester et al., 1997; Boschiet al., 1998; Furusawa et al., 1998). It will now be used tolink the three measures of entanglement. I shall showthat all the different measures of entanglement can beunderstood as special cases of the relative entropy ofentanglement (Henderson and Vedral, 2000). This unifi-cation relies on adding an ancilla, which I shall call amemory system and which will help us keep track of thevarious decompositions of a given bipartite density ma-

Page 25: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

221V. Vedral: Relative entropy in quantum information theory

trix. How much access is available to this memory deter-mines which measure of entanglement is used.

C. Measures of entanglement from relative entropy

Suppose that Alice and Bob share a state describedby the density matrix rAB . The state rAB has aninfinite number of different decompositions «5$ucAB

i &^cABi u,pi% into pure states ucAB

i &, with prob-abilities pi . We denote the mixed state rAB written indecomposition « by

rAB« 5(

ipiucAB

i &^cABi u. (75)

As we have seen, measures of entanglement are associ-ated with the formation and distillation of pure andmixed entangled states. The known relationships be-tween the different measures of entanglement for mixedstates are ED(rAB)<ERE(rAB)<EF(rAB) (Vedral andPlenio, 1998). Equality holds for pure states, in which allthe measures reduce to the von Neumann entropy,S(rA)5S(rB).

Formation of an ensemble of n nonmaximally en-tangled pure states, rAB5ucAB&^cABu, is achieved bythe following protocol. Alice first prepares the states shewould like to share with Bob locally. She then uses theSchumacher compression (Jozsa and Schumacher, 1994;Schumacher, 1995), to compress subsystem B intonS(rB) states. Subsystem B is then teleported to Bobusing nS(rB) maximally entangled pairs. Bob decom-presses the states he receives and so ends up sharing ncopies of rAB with Alice. The entanglement of forma-tion is therefore EF(rAB)5S(rB). For pure states, thisprocess requires no classical communication in theasymptotic limit (Lo and Popescu, 1999). The reverseprocess of distillation is accomplished using the Schmidtprojection method (Bennett, Bernstein, et al., 1996),which allows nS(rB) maximally entangled pairs to bedistilled in the limit as n becomes very large. No classi-

FIG. 7. Formation of a state by local operations and classicalcommunication and with the help of teleportation. First, Alicecreates the joint state of subsystems A and B locally. Then, sheperforms quantum data compression on the subsystem B andteleports the compressed state to Bob. Finally, Bob decom-presses the received state. Hence Alice and Bob end up shar-ing the joint state of A and B initially prepared by Alice.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

cal communication between the separated parties is re-quired. Therefore pure states are fully interconvertiblein the asymptotic limit.

The situation for mixed states is more complex. Whenany mixed state, denoted by Eq. (75), is created, it maybe imagined to be part of an extended system whosestate is pure. The pure states ucAB

i & in the mixture maybe regarded as correlated to orthogonal states umi& of amemory M . The extended system is in the pure stateucMAB&5( iApiumi&ucAB

i &. If we have no access to thememory system, we trace over it to obtain the mixedstate in Eq. (75). In fact, the lack of access to thememory is of a completely general nature. It may be dueto interaction with another inaccessible system, or it maybe due to an intrinsic loss of information. The results Ishall present are universally valid and do not depend onthe nature of the information loss. We shall see that theamount of entanglement involved in the different en-tanglement manipulations of mixed states depends onthe accessibility of the information in the memory atdifferent stages. Note that a unitary operation onucMAB& will convert it into another pure state ufMAB&with the same entanglement, and tracing over thememory yields a different decomposition of the mixedstate. Reduction of the pure state to the mixed state maybe regarded as due to a projection-valued measurementon the memory with operators $Ei5umi&^miu%.

Consider first the protocol of formation by means ofwhich Alice and Bob come to share an ensemble of nmixed states rAB as in Fig. 7. Alice first creates themixed states locally by preparing a collection of n statesin a particular decomposition, «5$ucAB

i &^cABi u,pi%, by

making npi copies of each pure state ucABi &. At the same

time we may imagine a memory system entangled withthe pure states to be generated, which keeps track of theidentity of each member of the ensemble. I consider firstthe case in which the states of subsystems A and B to-gether with the memory are pure. Later I shall considerthe situation in which Alice’s memory is decohered.There are then three ways for her to share these stateswith Bob. First of all, she may simply compress sub-system B to nS(rB) states and teleport these to Bobusing nS(rB) maximally entangled pairs. The choice ofwhich subsystem to teleport is made so as to minimizethe amount of entanglement required, so that S(rB)<S(rA). The teleportation in this case would require noclassical communication in the asymptotic limit, just asfor pure states (Lo and Popescu, 1999). The whole sys-tem that is created by this process is an ensemble of purestates ucMAB& , where subsystems M and A are on Al-ice’s side and subsystem B is on Bob’s side. In terms ofentanglement resources, however, this process is not themost efficient way for Alice to send the states to Bob.She may do it more efficiently by using the memory sys-tem of ucMAB& to identify blocks of npi members in eachpure state ucAB

i & and applying compression to eachblock to give npiS(rB

i ) states. Then the total number ofmaximally entangled pairs required to teleport these

Page 26: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

222 V. Vedral: Relative entropy in quantum information theory

states to Bob is n( ipiS(rBi ), which is clearly less than

nS(rB), by concavity of the entropy.The amount of entanglement required clearly de-

pends on the decomposition of the mixed state rAB . Inorder to decompress these states, Bob must also be ableto identify which members of the ensemble are in whichstate. Therefore Alice must also send him the memorysystem. She now has two options. She may either tele-port the memory to Bob, which would use more en-tanglement resources, or she may communicate the in-formation in the memory classically, with no further useof entanglement. When Alice uses the minimum en-tanglement decomposition, «5$ucAB

i &^cABi u,pi%, this

process, originally introduced by Bennett, DiVincenzo,et al. (1996), makes the most efficient use of entangle-ment, consuming only the entanglement of formation ofthe mixed state, EF(rAB)5( ipiS(rB

i ).We may think of the classical communication between

Alice and Bob in one of two equivalent ways. Alice mayeither measure the memory locally to decohere it andthen send the result to Bob classically, or she may sendthe memory through a completely decohering quantumchannel. Since Alice and Bob have no access to thechannel, the state of the whole system created by thisprocess is the mixed state

rABM« 5(

ipiucAB

i &^cABi u ^ umi&^miu, (76)

where Bob is classically correlated to the AB subsystem.Bob is then able to decompress his states using thememory to identify members of the ensemble.

Once the collection of n pairs is shared between Aliceand Bob, it is converted into an ensemble of n mixedstates rAB by destroying access to the memory whichcontains the information about the state of any particu-lar member of the ensemble. It is the loss of this infor-mation that is responsible for the fact that entanglementof distillation is lower than entanglement of formation,since it is not available to parties carrying out the distil-lation. If Alice and Bob, who do have access to thememory, were to carry out the distillation, they couldobtain as much entanglement from the ensemble as wasrequired to form it. In the case in which Alice and Bobshare an ensemble of the pure state ucMAB&, they wouldsimply apply the Schmidt projection method (Bennett,Bernstein, et al., 1996). The relative entropy of entangle-ment gives the upper bound to distillable entanglement,ERE(uc(MA):B&^c(MA):Bu)5S(rB), which is the same asthe amount of entanglement required to create the en-semble of pure states, as described above. Here M , A ,and B are spatially separated subsystems on which jointoperations may not be performed. In my notation, I usea colon to separate the local subsystems.

On the other hand, if Alice uses the least entangle-ment for producing an ensemble of the mixed state rAB ,together with classical communication, the state of thewhole system is an ensemble of the mixed state rABM

« ,and the process is still reversible. Because of the classi-cal correlation to the states ucAB

i &, Alice and Bob may

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

identify blocks of members in each pure state ucABi & and

apply the Schmidt projection method to them, givingnpiS(rB

i ) maximally entangled pairs, and hence a totalentanglement of distillation of ( ipiS(rB

i ). The relativeentropy of entanglement again quantifies the amount ofdistillable entanglement from the state rABM

« and isgiven by ERE(rA :(BM)

« )5minsABMPDS(rABM« isABM). The

disentangled state that minimizes the relative entropy issABM5( ipisAB

i^ umi&^miu, where sAB

i is obtained fromucAB

i &^cABi u by deleting the off-diagonal elements in the

Schmidt basis. This is the minimum because the staterMAB is a mixture of the orthogonal states umi&ucAB

i &,and for a pure state ucAB

i & , the disentangled state thatminimizes the relative entropy is sAB

i . The minimumrelative entropy of the extended system is then

S~rABM« isABM!5(

ipiS~rB

i !.

This relative entropy, ERE(rA :(BM)« ), has previously

been called the entanglement of projection (Garisto andHardy, 1999) because the measurement on the memoryprojects the pure state of the full system into a particulardecomposition. The minimum of ERE(rA :(BM)

« ) over alldecompositions is equal to the entanglement of forma-tion of rAB . However, Alice and Bob may choose tocreate the state rAB by using a decomposition withhigher entanglement than the entanglement of forma-tion. The maximum of ERE(rA :(BM)

« ) over all possibledecompositions is called the entanglement of assistanceof rAB (DiVincenzo et al., 1999). Because ERE(rA :(BM)

« )is a relative entropy, it is invariant under local opera-tions and nonincreasing under general operations, prop-erties that are conditions for a good measure of en-tanglement (Vedral and Plenio, 1998). However, unlikeERE(rAB) and EF(rAB), it is not zero for completelydisentangled states. In this sense, the relative entropy ofentanglement, ERE(rA :(BM)

« ), defines a class of en-tanglement measures interpolating between the en-tanglement of formation and the entanglement of assis-tance. Note that an upper bound for the entanglementof assistance, EA , can be shown using concavity (DiVin-cenzo et al., 1998) to be EA(rAB)<min@S(rA),S(rB)#.This bound can also be shown from the fact that thedistillable entanglement from any decomposition,ERE(rA :(BM)

« )<EA(rAB), cannot be greater than theentanglement of the original pure state.

Note that here we are really creating a state r ^ n5r^ r¯r . The entanglement of formation of such a stateis, strictly speaking, given by EF(r ^ n), so the entangle-ment of formation per one single pair is EF(r ^ n)/n . It isat present not clear if this is the same as EF(r) in gen-eral, i.e., whether the entanglement of formation is ad-ditive. Bearing this in mind we continue our discussion,whose conclusions will not depend on the validity of theadditivity assumption of the entanglement of formation(for more on this issue see, for example, Hayden et al.,2001).

Page 27: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

223V. Vedral: Relative entropy in quantum information theory

We may also derive relative entropy measures thatinterpolate between the relative entropy of entangle-ment and the entanglement of formation (Horodeckiet al., 2000) by considering nonorthogonal measure-ments on the memory. First of all, the fact that the en-tanglement of formation is in general greater than theupper bound for entanglement of distillation emerges asa property of the relative entropy, namely, it cannot in-crease under the local operation of tracing one sub-system [this is property (F2) of the quantum relativeentropy given in Sec. II; Lindblad, 1974]:

EF~rAB!5 minsABMPD

S~rABMisABM!

> minsABPD

S~rABisAB!. (77)

In general, the loss of the information in the memorymay be regarded as a result of an imperfect classicalchannel. This is equivalent to Alice’s making a nonor-thogonal measurement on the memory and sending theresult to Bob. In the most general case, $Ei5AiAi

1% is aPOVM [loosely speaking, this is a CP map as in Eq.(13), where all the individual outcomes are recorded]performed on the memory. The decomposition corre-sponding to this measurement is composed of mixedstates, j5$qi ,TrM(AirMABAi

1)%, where qi

5Tr(AirMABAi1). The relative entropy of entangle-

ment of the state rMABj , when j is a decomposition of

rAB resulting from a nonorthogonal measurement onM , defines a class of entanglement measures interpolat-ing between the relative entropy of entanglement andthe entanglement of formation of the state rAB . In theextreme case where the measurement gives no informa-tion about the state rAB , ERE(rA :(BM)

« ) becomes therelative entropy of entanglement of the state rAB itself.In between, the measurement gives partial information.So far, I have shown that the measures interpolating be-tween entanglement of assistance and entanglement offormation result from making orthogonal measurementson preparations of the pure state ucMAB& in differentbases. I note that they may equally be achieved by usingthe preparation associated with entanglement of assis-tance and making increasingly nonorthogonal measure-ments.

D. Classical information and quantum correlations

The loss of entanglement may be related to the loss ofinformation in the memory. There are two stages atwhich distillable entanglement is lost. The first is in theconversion of the pure state ucMAB& into a mixed staterABM . This happens because Alice uses a classical chan-nel to communicate the memory to Bob. The second isdue to loss of the memory M , taking the state rABM torAB . The amount of information lost may be quantifiedas the difference in mutual information between the re-spective states. Mutual information is a measure of cor-relations between the memory M and the system AB ,giving the amount of information about AB that may be

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

obtained from a measurement on M . The quantum mu-tual information between M and AB is defined asIQ(rM :(AB))5S(rM)1S(rAB)2S(rMAB). The mutualinformation loss in going from the pure state ucMAB& tothe mixed state in Eq. (76) is DIQ5S(rAB). There is acorresponding reduction in the relative entropy of en-tanglement, from the entanglement of the original purestate, ERE(uc(MA):B&^c(MA):Bu), to the entanglement ofthe mixed state ERE(rA :(BM)

« ) for all decompositions «arising as the result of an orthogonal measurement onthe memory. It is possible to prove, using the nonin-crease of relative entropy under local operations, thatwhen the mutual information loss is added to the rela-tive entropy of entanglement of the mixed stateERE(rA :(BM)

« ), the result is greater than the relative en-tropy of entanglement of the original pure state,ERE(uc(MA):B&^c(MA):Bu) (Henderson and Vedral,2000). The strongest case, which occurs whenERE(rA :(BM)

« )5EF(rAB), is

ERE~ uc(MA):B&^c(MA):Bu!<EF~rAB!1S~rAB!. (78)

A similar result may be proved for the second loss,due to loss of the memory (Henderson and Vedral,2000). Again the mutual information loss is DIQ5S(rAB). The relative entropy of entanglement is re-duced from ERE(rA :(BM)

« ), for any decomposition « re-sulting from an orthogonal measurement on thememory, to ERE(rAB), the relative entropy of entangle-ment of the state rAB with no memory. When the mu-tual information loss is added to ERE(rAB), the result isgreater than ERE(rA :(BM)

« ). In this case, the result isstrongest for ERE(rA :(BM)

« )5EA(rAB):

EA~rAB!<ERE~rAB!1S~rAB!. (79)

Notice that if rAB is a pure state, then S(rAB)50, andequality holds. Inequalities (78) and (79) provide lowerbounds for EF(rAB) and ERE(rAB), respectively. Theyare of a form typical of irreversible processes in thatrestoring the information in M is not sufficient to restorethe original correlation between M and AB . In particu-lar, they express that the loss of entanglement betweenAlice and Bob at each stage must be accompanied by aneven greater reduction in mutual information betweenthe memory and subsystems AB . The general result canbe derived from Donald’s equality (Donald, 1986, 1987).We have in general that for any s and r5( ipir i thefollowing is true:

S~ris!1(i

piS~r iir!5(i

piS~r iis!.

Suppose that E(r)5S(ris). Then, since E(r i)<S(r iis), we have the inequality

E~r!1(i

piS~r iir!>(i

piE~r i!.

Thus the loss of entanglement in $pi ,r i%→r is boundedfrom above by the Holevo information

(i

piE~r i!2E~r!<(i

piS~r iir!. (80)

Page 28: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

224 V. Vedral: Relative entropy in quantum information theory

This is a physically pleasing property of entanglement. Itsays that the amount of information lost always exceedsthe lost entanglement, which indicates that entangle-ment stores only a part of the information; the rest, ofcourse, is stored in classical correlations (see also Eisertet al., 2000, who consider a similar problem, althoughnot in the full generality of the above analysis).

In summary, the relative entropy of entanglement ofthe state rAB depends only on the density matrix rABand gives an upper bound to the entanglement of distil-lation. The other measures of entanglement, which aregiven by relative entropies of an extended system, alldepend on how the information in the memory is usedor how the density matrix is decomposed. There are nu-merous decompositions of any bipartite mixed state intoa set of states r i with probability pi . The average en-tanglement of states in each decomposition is given bythe relative entropy of entanglement of the system ex-tended by a memory whose orthogonal states are classi-cally correlated to the states of the decomposition. Thiscorrelation records which state r i any member of an en-semble of mixed states rAB

^ n is in. It is available to partiesinvolved in the formation of the mixed state, but is notaccessible to parties carrying out distillation. When theclassical information is fully available, different decom-positions give rise to different amounts of distillable en-tanglement, the highest being entanglement of assis-tance and the lowest, entanglement of formation. Ifaccess to the classical record is reduced, the amount ofdistillable entanglement is reduced. In the limit whereno information is available, the upper bound to the dis-tillable entanglement is given by the relative entropy ofentanglement of the state rAB itself, without the exten-sion of the classical memory.

I close this section by discussing generalizations tomore than two subsystems. First of all, it is not at allclear how to perform this in the case of entanglementof formation and distillation. The former just does nothave a natural generalization and, for the latter, it is notclear what states we should be distilling when we havethree or more parties. The relative entropy of entangle-ment on the other hand, does not suffer from this prob-lem (Vedral, Plenio, Jacobs, and Knight, 1997; Vedraland Plenio, 1998). Its definition for N parties wouldbe ERE(s)ªminrPD S(sir) where r5( ipir1

i^ r2

i^¯

^ rNi .

I shall now use the knowledge we have gained of clas-sical and quantum correlations to describe quantumcomputation. It will be seen, perhaps somewhat surpris-ingly, that classical correlations will play a more promi-nent role than quantum correlations in the speedup ofcertain quantum algorithms.

V. QUANTUM COMPUTATION

A quantum computer is a physical system that canaccept input states which represent a coherent superpo-sition of many different possible basis states and subse-quently evolve them into a corresponding superpositionof outputs. Computation, i.e., a sequence of unitary

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

transformations, affects simultaneously each element ofthe superposition, generating a massive parallel dataprocessing albeit within one piece of quantum hardware.In this way quantum computers can efficiently solvesome problems that are believed to be intractable onclassical computers (Deutsch and Josza, 1992; Bernsteinand Vazirani, 1993; Simon, 1994). The best example isShor’s factorization algorithm (Shor, 1996); for overviewof this algorithm see Ekert and Josza (1996). Thereforethe advantage of a quantum computer lies in the exploi-tation of the phenomenon of superposition. The greatimportance of the quantum theory of computation is inthe fact that it reveals the fundamental connections be-tween the laws of physics and the nature of computation(Deutsch, 1998).

In order to understand the efficiency of computer al-gorithms, we have to discuss the theory of computa-tional complexity. I shall only mention the basics; a moredetailed account can be found in the article of Papadimi-triou (1995). Computational complexity concerns thedifficulty of solving certain problems, such as the multi-plication of two numbers, finding the minimum of agiven function, and so on. Complexity theory dividesproblems into two basic categories:

(1) Easy problems: the time of computation T is a poly-nomial function of the size of the input l , for ex-ample, T5cnln1¯1c1l1c0 , where the coefficientsc are determined by the problem.

(2) Hard problems: the time of computation is an expo-nential function of the size of the input (for ex-ample, T52cl, where c is problem dependent).

The size of the input is always measured in bits (qubits).For example, if we are to store the number 15, then weneed 4 bits. In general, to store a number N we needabout l5log N, where the base of the logarithm is 2.

The division of problems into ‘‘easy’’ and ‘‘hard’’ is, ofcourse, very rough. First of all, in computation, apart

FIG. 8. The Mach-Zender interferometer. A photon is split ata beamsplitter and can take two different paths. In each of thepaths we have a different phase introduced to the photon state,so that, after it encounters the second beamsplitter, the prob-abilities of detection in two branches have a sinusoidal depen-dence on the phase difference. In terms of quantum computa-tion, the beamsplitter implements the Hadamard transformand the whole interferometer can be seen as implementingDeutsch’s algorithm (see text for explanation).

Page 29: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

225V. Vedral: Relative entropy in quantum information theory

from time, there are other resources that might matter,such as space, energy, and so on. If time grows polyno-mially but we require an exponentially increasing en-ergy, then the problem is clearly difficult. In addition,suppose that the time complexity of one problem is1010n and that of another is 102102n. Then for small n(say, n510), the second algorithm, in spite of being ex-ponential, is clearly more efficient. These two issues ex-emplify that the division into hard and easy problems isnot without its own problems. However, this classifica-tion system is very simple to put into practice and doesilluminate many different aspects of computationalproblems, which is why it is so widely used. I refer thereader to the book by Garey and Johnson (1979), whichpresents an introduction to hard problems and their de-tailed classification.

There is a great simplification in understanding quan-tum computation: a quantum computer is formallyequivalent to a multiparticle Mach-Zender-like interfer-ometer (Cleve et al., 1997). I first present the simplestkind of interferometer in terms of its function as asimple computer. We see from Fig. 8 that the path of thephoton is in fact a quantum bit in the sense that thephoton can be in a superposition of the two paths. Thefirst beam splitter acts as the unitary evolution u0&→u0&1u1&, which is known as the Hadamard gate. Nextis the phase shift, which has the following effect:

u0&→eif(0)u0&,

u1&→eif(1)u1&.

At the end we have another beamsplitter and two detec-tors measuring contributions to the state u0& and u1&.The corresponding probabilities of detection are

P05cos2f~0 !2f~1 !

2,

P15sin2f~0 !2f~1 !

2.

If, for example, f(0)5f(1), then only detector 0 willbe registering counts. If, however, f(0)5f(1)6p , thenonly detector 1 will be registering counts. These twosituations are basically identical to what is known asDeutsch’s algorithm (Deutsch, 1985), the first algorithmto give an indication that quantum computers are morepowerful than their classical counterparts. This algo-rithm has also been implemented experimentally innuclear magnetic resonance (NMR) (Jones and Mosca,1998).

A. Deutsch’s algorithm

Deutsch’s algorithm (Deutsch, 1985; see also Deutsch,1998) is the simplest possible example that illustrates theadvantages of quantum computation. The problem is thefollowing. Suppose that we are given a binary functionof a binary variable f :$0,1%→$0,1%. Thus f(0) can beeither 0 or 1, and f(1) likewise can be either 0 or 1,giving altogether four possibilities. However, supposethat we are not interested in the particular values of the

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

function at 0 and 1, but we need to know whether thefunction is constant [i.e., f(0)5f(1)], or varying [i.e.,f(0)Þf(1)]. Deutsch poses the following task: by com-puting f only once, determine whether it is constant orvarying. This kind of problem is generally referred to asa promise algorithm, because one property out of a cer-tain number of properties is initially promised to hold,and our task is to determine computationally which oneholds (see also Deutsch and Josza, 1992; Bernstein andVazirani, 1993; and Simon, 1994) for other similar typesof promise algorithms).

First of all, classically finding out in one step whethera function is constant or varying is clearly impossible.We need to compute f(0) and then compute f(1) inorder to compare them. There is no way out of thisdouble evaluation. Quantum mechanically, however,there is a simple method for performing this task bycomputing f only once. Two qubits are needed for thecomputation. In reality only one qubit is really needed,but the second qubit is there to implement the necessarytransformation. We can imagine that the first qubit is theinput to the quantum computer whose internal (hard-ware) part is represented by the second qubit. The com-puter itself will implement the following transformationon the two qubits (we perform this fully quantum me-chanically, i.e., we are now not using ‘‘classical’’ devicessuch as beamsplitters):

ux&uy&→ux&uy % f~x !& , (81)

where x is the input qubit and y is the hardware, asdepicted in Fig. 9. Note that this transformation is re-versible and thus there is a unitary transformation toimplement it (but we shall not pay any attention to thatat the moment, as we are interested here only in thebasic principle). Note also that f has been used onlyonce. The trick is to prepare the input in such a statethat we make use of quantum superpositions. Let ushave at the input

ux&uy&5~ u0&1u1&)~ u0&2u1&), (82)

where ux& is the actual input and uy& is part of the com-puter hardware. Thus, before the transformation isimplemented, the state of the computer is an equal su-perposition of all four basis states, which we obtain bysimply expanding the state in Eq. (82),

uC in&5u00&2u01&1u10&2u11&.

Note that there are negative phase factors before the

FIG. 9. A network that implements a phase-flip operationgiven the black-box computing the function f(x). The unitarytransformation U implements u0&→2u0& conditionally on thevalue of f(x).

Page 30: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

226 V. Vedral: Relative entropy in quantum information theory

second and fourth terms. When this state now undergoesthe transformation in Eq. (81), we have the followingoutput state:

uCout&5u0f~0 !&2u0f~0 !&1u1f~1 !&2u1f~1 !&

5u0&„uf~0 !&2uf~0 !&…1u1&„uf~1 !&2uf~1 !&…,

where the overbar indicates the opposite of that value,so that, for example, 051. Now we see where the powerof quantum computers is fully realized: each of the com-ponents in the superposition of uC in& underwent thesame evolution of Eq. (81) ‘‘simultaneously,’’ leading tothe powerful ‘‘quantum parallelism’’ (Deutsch, 1985).This feature is true for quantum computation in general.Let us now look at the two possibilities.

(1) If f is constant then

uCout&5~ u0&1u1&)„uf~0 !&2uf~0 !&….

(2) If f is varying then

uCout&5~ u0&2u1&)„uf~0 !&2uf~0 !&….

Note that the output qubit (the first qubit) emerges intwo different orthogonal states, depending on the typeof f . These two states can be distinguished with 100%efficiency. This is easy to see if we first perform a Had-amard transformation on this qubit, leading to the stateu0& if the function is constant and to the state u1& if thefunction is varying. Now a single projective measure-ment in 0,1 basis determines the type of the function.Therefore, unlike their classical counterparts, quantumcomputers can solve Deutsch’s problem.

Let us now rephrase this in terms of phase shifts toemphasize its underlying identity with the above-mentioned Mach-Zender interferometer. The transfor-mation of the two registers is the following:

ux&u2&⇒eipf(x)ux&u2&,

where x50,1 and u2&5u0&2u1&. Thus the first qubit islike a photon in the interferometer, receiving a condi-tional phase shift depending on its state (0 or 1). It is leftto the reader to show that this transformation is formallyidentical to the above analysis. The second qubit is therejust to implement the phase shift quantum mechanically.It should be emphasized that this quantum computation,although extremely simple, contains all the main fea-tures of successful quantum algorithms: it can be shownthat all quantum computations are just more compli-cated variations of Deutsch’s problem (Cleve et al.,1997). We shall use the introduction of a phase shift as abasic element of a quantum computer and relate this tothe notion of distinguishability and relative entropy.

Note one important aspect: the input could also be ofthe form u2&u2&. A constant function would then leadto the state u2&u2& and a varying function would lead tou1&u2&. So the u1& and u2& are equally good as inputstates of the first qubit and both lead to quantumspeedup. Their equal mixture, on the other hand, is not.This means that the output would be an equal mixtureu1&^1u1u2&^2u no matter whether f(0)5f(1) or

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

f(0)Þf(1), i.e., the two possibilities would be indistin-guishable. Thus for the quantum algorithm to work well,we need the first register to be highly correlated to thetwo different types of functions. If the output state ofthe first qubit r1 indicates that we have a constant func-tion and r2 that we have a varying function, then theefficiency of Deutsch’s algorithm depends on how wellwe can distinguish the two states r1 and r2 . This is givenby the Holevo bound,

H5S~r!212

@S~r1!1S~r2!# ,

where r51/2(r11r2). Thus, if r15r2 , then H50 andthe quantum algorithm has no speedup over the classicalone. At the other extreme, if r1 and r2 are pure andorthogonal, then H51 and the computation gives theright result in one step. In between these two extremeslie all other computations with varying degrees of effi-ciency as quantified by the Holevo bound. Note thatthese are purely classical correlations and that there isno entanglement between the first and the second qubit.In fact, the Holevo bound is the same as the formulaI suggested for classical correlations in the previoussection. The key to understanding the efficiency ofDeutsch’s algorithm is therefore through the mixednessof the first register. If the initial state has the entropy ofS0 , then the final Holevo bound is

S~r!2S0 .

So the more mixed the first qubit, the less efficient thecomputation. Note that the quantum mutual informa-tion between the first two qubits is zero throughout theentire computation (so there are neither classical norquantum correlations between them).

B. Computation: Communication in time

Can we extend the above entropic analysis to otheralgorithms as well? The answer is yes, and this is exactlywhat I shall describe next (Bose, Rallan, and Vedral,2000). To explain why this is so, I first need to introducea few definitions and a communication model of quan-tum computation. We have two programmers, thesender and the receiver, and two registers, the memory(M) register and the computational (C) register. Thesender prepares the memory register in a certain quan-tum state ui&M , which encodes the problem to be solved.For example, in the case of factorization (Shor, 1996),this register will store the number to be factored. In thecase of a search (Grover, 1996), this register will storethe state of the list to be searched. The number N ofpossible states ui&M will, of course, be limited by thegreatest number that the given computer could factor orthe largest list that it could search. The receiver thenprepares the computational register in some initial staterC

0 . Both the sender and the receiver feed the registers(prepared by them) to the quantum computer. Thequantum computer implements the following generaltransformation on the registers:

Page 31: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

227V. Vedral: Relative entropy in quantum information theory

~ ui&^iu!M ^ rC0 →~ ui&^iu!M ^ UirC

0 Ui† . (83)

The resulting state rC(i)5UirC0 Ui

† of the computationalregister contains the answer to the computation and ismeasured by the receiver. As the quantum computationshould work for any ui&M , it should also work for anymixture ( i

Npi(ui&^iu)M , where pi are probabilities. Forthe sender to use the above computation as a communi-cation protocol, he has to prepare any one of the statesui&M with an a priori probability pi . The entire inputensemble is thus ( i

Npi(ui&^iu)M ^ rC0 . Due to the quan-

tum computation, this becomes

(i

N

pi~ ui&^iu!M ^ rC0 →(

i

N

pi~ ui&^iu!M ^ rC~ i !. (84)

Whereas, before the quantum computation, the two reg-isters were completely uncorrelated (mutual informationis zero), at the end the mutual information becomes

IMCªS~rM!1S~rC!2S~rMC!

5S~rC!2(i

N

piS@rC~ i !# , (85)

where rM and rC are the reduced density operators forthe two registers, rMC is the density operator of the en-tire M1C system, and S(r)52Trr log r is the vonNeumann entropy (for conventional reasons I shall uselog2 in all calculations). Notice that the value of the mu-tual information (i.e., correlations) is equal to theHolevo bound H5S(rC)2( i

NpiS@rC(i)# for the classi-cal capacity of a quantum communication channel(Holevo, 1973). Note also that rC5( i

NpirC(i). This tellsus how much information the receiver can obtain aboutthe choice ui&M made by the sender by measuring thecomputational register. The maximum value of H is ob-tained when the states rC(i) are pure and orthogonal.Moreover, the sender conveys the maximum informa-tion when all the message states have equal a prioriprobability (which also maximizes the channel capacity).In that case the mutual information (channel capacity)at the end of the computation is log N. Thus the commu-nication capacity IMC [given by Eq. (85)] gives an indexof the efficiency of a quantum computation. A necessarygoal of a quantum computation is to achieve the maxi-mum possible communication capacity consistent withgiven initial states of the quantum computer. We cannotgive a sufficiency criterion from our general approach, asthis depends on the specifics of an algorithm. If onebreaks down the general unitary transformation Ui of aquantum algorithm into a number of successive unitaryblocks, then the maximum capacity may be achievedonly after a number of applications of the block. In eachof the smaller unitary blocks, the mutual informationbetween the M and the C registers (i.e., the communi-cation capacity) increases by a certain amount. When itstotal value reaches the maximum possible value consis-tent with a given initial state of the quantum computer,the computation is regarded as being complete.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

C. Black-box complexity

Any general quantum algorithm has to have a certainnumber of queries into the memory register (Bennettet al., 1997; Beals et al., 1998; Ambainis, 2000). This isnecessitated by the fact that the transformation on thecomputational register has to depend on the problem athand, encoded in ui&M . These queries can be consideredto be implemented by a black box into which the statesof both the memory and the computational registers arefed. The number of such queries needed in a certainquantum algorithm gives the black-box complexity ofthat algorithm (Bennett et al., 1997; Beals et al., 1998;Ambainis, 2000) and is a lower bound on the complexityof the whole algorithm. The black-box approach is asimplification for looking at the complexity of an algo-rithm. A black box allows us to perform a certain com-putation without having its exact details. It is possiblethat physical implementations of a particular black boxmay prove to be difficult. Therefore, when we estimatethe complexity of an algorithm by counting the numberof applications of a black box, we have to bear in mindthat there might an additional complexity componentarising in physical implementation.

In general we have a function f :$0,1%n→$0,1% (so thefunction maps n-bit values to either 0 or 1). Quantumalgorithms, such as a database search, can be expressedin this form (in the case of a database search, all thevalues of f are 0 apart from one that is equal to 1; thetask is to find this value). The black box is assumed to beable to perform the transformation ux&uy&→ux&uf(x)% y&, just as in Deutsch’s algorithm. We have the free-dom to represent this black-box transformation as aphase flip which is equivalent in power (up to a constantfactor as seen in Fig. 9),

ux&uy&→~21 !f(x) % yux&uy&.

Recently, Ambainis (2000) showed in a very elegant pa-per that if the memory register was prepared initially inthe superposition ( i

Nui&M , then, in a search algorithm,O(AN) queries would be needed to completely entangleit with the computational register. This gives a lowerbound on the number of queries in a search algorithm.In a manner analogous to his, I shall calculate thechange in mutual information between the memory andthe computational registers [from Eq. (85)] in one querystep. The number of queries needed to increase the mu-tual information to log N (for perfect communication be-tween the sender and the receiver) is then a lowerbound on the complexity of the algorithm.

D. Database search

Any search algorithm, whether quantum or classical,regardless of its explicit form, will have to find a matchfor the state ui&M of the M register among the states uj&Cof the C register and associate a marker to the state thatmatches (here uj&C is a complete orthonormal basis forthe C register). The most general way of doing such a

Page 32: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

228 V. Vedral: Relative entropy in quantum information theory

query in the quantum case is the black-box unitarytransformation (Ambainis, 2000),

UBui&Muj&C5~21 !d ijui&Muj&C . (86)

Any other unitary transformation performing a querymatching the states of the M and the C registers couldbe constructed from the above type of query. Note thatthe black box is able to recognize whether a value in theC register is the same as the solution, but is unable toexplicitly provide that solution for us. For example,imagine that Socrates goes to visit the all-knowing an-cient Greek oracle (black box), who is able to answerwith only ‘‘yes’’ or ‘‘no.’’ Suppose further that Socrateswants to know who is the wisest person in the world. Hewould then have to ask something like ‘‘Is Plato the wis-est person in the world?’’ and would not be able to askdirectly ‘‘Who is the wisest person in the world?’’ This‘‘yes-no’’ approach is typical of any black-box analysis.The advantage of using this black box quantum me-chanically is that we can query all the individual ele-ments of the superposition simultaneously. Although wecan identify the solution in one step quantum mechani-cally, further computations are required to amplify theright solution so that the subsequent measurement ismore likely to reveal it.

I would like to put a bound on the change of themutual information in one such black-box step. Letthe memory states ui&M be available to the sender withequal a priori probability so that the communication ca-pacity is a maximum. His initial ensemble is then1/N ( i

N(ui&^iu)M . Let the receiver prepare the C registerin an initial pure state c0 [in fact, the power of quantumcomputation stems from the ability of the receiver toprepare pure-state superpositions of form(1/N) ( j

Nuj&C]. This is an equal-weight superposition ofall uj&C as there is no a priori information about the rightuj&C . This can be done by performing a Hadamard trans-formation on each qubit of the C register. In general,there will be many black-box steps on the initial en-semble before a perfect correlation is set up between theM and the C registers. After the kth black-box step, letthe state of the system be

rk51N (

i

N

~ ui&^iu!M ^ @ uck~ i !&^ck~ i !u#C , (87)

where

uck~ i !&C5(j

a ijk uj&C . (88)

The (k11)th black-box step changes this state to rk11

5(1/N) ( iN(ui&^iu)M ^ @ uck11(i)&^ck11(i)u#C with

uc(k11)~ i !&5(i ,j

N

a ijk ~21 !d ijuj&C . (89)

Thus we only have to evaluate the difference of mutualinformation between the M and the C register for thestates. This difference of mutual information [whencomputed from Eq. (85)] can be shown to be the differ-

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

ence uS(rCk11)2S(rC

k )u (Henderson and Vedral, 2000).This quantity is bounded from above by (Fannes, 1999)

uS~rCk11!2S~rC

k !u<dB~rCk ,rC

k11!log N

2dB~rCk ,rC

k11!log dB~rCk ,rC

k11!,

(90)

where dB(s ,r)5A12F2(s ,r) is the Bures metric(Bures, 1969; Uhlmann, 1976, 1986) and F(s ,r)5Tr(ArsAr)1/2 is the fidelity. Using methods similar tothose of Ambainis (2000), it can be shown thatF(rC

0 ,rC1 )> (N22)/N from which it follows that the

change in the first step is

uS~rC0 !2S~rC

1 !u<3

ANlog N . (91)

The change uS(rCk )2S(rC

k11)u in the subsequent stepshas to be less than or equal to the change in the firststep. This is because the Bures metric does not increaseunder general completely positive maps (which is whatthe query represents when we trace out the M register).Any other operations performed only on the C registerin between two queries can only reduce the mutual in-formation between the C and the M registers. Thismeans that at least O(AN) steps are needed to producefull correlations (maximum mutual information of valuelog N) between the two registers. This gives the black-box lower bound on the complexity of any quantumsearch algorithm. Of course, we know that there alsoexists an algorithm achieving this bound due to Grover(1996), and this has been proven to be optimal (Bennett,Bernstein, et al., 1996; Zalka, 1999; Ambainis, 2000).However, the proof presented here is the most general,as it holds even when any type of completely positivemap is allowed between queries [in Zalka (1999) a heu-ristic argument was made for the optimality of Grover’salgorithm under general operations]. Grover’s algorithmhas also been implemented experimentally (Chuanget al., 1998; Jones, Mosca, and Hansen, 1998).

I now use Grover’s algorithm to show how the mutualinformation varies with time in a quantum search. The

FIG. 10. The circuit for Grover’s algorithm. C is the computa-tional register and M is the memory register. UB is the black-box query transformation, H is a Hadamard transformation onevery qubit of the C register, and f0 is a phase flip in front ofthe u00¯0&C . The block consisting of H , UB , H , and f0 isrepeated a number of times.

Page 33: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

229V. Vedral: Relative entropy in quantum information theory

general sequence described by Cleve et al. (1997) forGrover’s algorithm will be used here. The algorithm con-sists of repeated blocks, each consisting of a Hadamardtransform on each qubit of the C register, followed by aUB (our black-box transformation), followed by anotherHadamard transform on each qubit of the C register,and finally a phase flip f0 of the u00¯0&C state of the Cregister (see Fig. 10). This block can then be repeated asmany times as is necessary to bring the mutual informa-tion to its maximum value of log N, which, as shown inEq. (91), is O(AN). Note that the only transformationcorrelating the M and C registers is the black-box trans-formation UB and that all the other transformations aredone only on the C register and therefore do not changethe mutual information between the two registers. InFig. 8 I have plotted the variation of mutual informationbetween the M and the C registers (i.e., the communi-cation capacity of the quantum computation) with thenumber of iterations of the block in Grover’s algorithm.It can be seen that the mutual information oscillateswith the number of iterations. Figure 11 is plotted for afour-qubit computational register that can search a data-base of 16 entries. It reveals that the period is roughly 6,which means that the number of steps needed to achievemaximum mutual information is roughly 3. This is wellabove our bound for the minimum number of steps,which is 4/3 in this case.

The fact that the mutual information oscillates peri-odically (or more precisely, quasiperiodically) followsfrom the quantum Poincare recurrence theorem (Hoggand Huberman, 1983), which states that if the system hasa discrete spectrum and is ‘‘driven’’ by a periodic poten-tial (as in Grover’s case, where we repeat the same op-eration time and again), then its wave function c(t) will

FIG. 11. Dependence of the mutual information between theM and the C registers as a function of the number of times theblock in Grover’s algorithm is iterated for various values ofinitial mixedness of the C register. Each qubit of the C registeris initially in the state pu0&^0u1(12p)u1&^1u; (a) p51; (b) p50.95; (c) p50.7. The (a) and (b) computations achievehigher mutual information than classically allowed in the orderof root N steps, while (c) does not.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

undergo a quasiperiodic motion, i.e., for any e.0, thereexists a relatively dense set $Te% such that

ic~ t1Te!2c~ t !i,e

for all time t and for each Te in the set. This is exactlythe behavior seen in Fig. 11. The distance between thetwo states uc&5( iaiui& and uf&5( jbjuj& is defined in theusual way:

ic2fiª(i

uai2bi u.

The three graphs (a), (b), and (c) in Fig. 11 are for dif-ferent values of initial mixedness of the C register. Wefind that the mutual information fails to rise to the maxi-mum value of log N when the state of the computationalregister is mixed. Our formalism thus allows us to calcu-late the performance of a quantum computation as afunction of the mixedness (quantified by the von Neu-mann entropy) of the computational register. We can puta bound on the entropy of the second register, afterwhich the quantum search becomes as inefficient as theclassical search. If the initial entropy S(rC

0 ) of the Cregister exceeds 1

2 log N, then the change in mutual in-formation between the M and the C registers in thecourse of the entire quantum computation would be atmost log AN . This can be achieved by a classical data-base search in AN steps, so there is no advantage inusing quantum evolution when the initial state is toomixed. Note that our condition

S~rC0 !>

12

log N

is only a sufficient and not a necessary condition for aquantum enhancement in efficiency.

I also point out that the states of the M register neednot be a mixture, but could be an arbitrary superpositionof states ui&M [such a state was used by Ambainis (2000)in his argument]. All the above arguments still hold inthat case, and the M and the C registers becomequantum-mechanically entangled and not just classicallycorrelated. Thus our analysis implies that any quantumcomputation is mathematically identical to a measure-ment process (Everett, 1973). The system being mea-sured is the M register and the apparatus is the C reg-ister of the quantum computer. As time progresses theapparatus (register C) becomes more and more corre-lated (or entangled) to the system (register M). Thismeans that the states of register C become more andmore distinguishable, which allows us to extract moreinformation about the M register by measuring the Cregister. The analysis in the last paragraph, in which Ishowed the limitations on the efficiency of quantumcomputation imposed by the mixedness of the C regis-ter, also applies to the efficiency of a quantum measure-ment when the apparatus is in a mixed state. Mixednessof an apparatus, to the best of our knowledge, has neverbeen considered in the analysis of quantum measure-ment. In general practice, any apparatus, however mac-roscopic, is considered to be in a pure quantum state

Page 34: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

230 V. Vedral: Relative entropy in quantum information theory

before the measurement. Our approach highlighting theformal analogy between measurement and computationoffers a way to analyze measurement in a much moregeneral context.

Finally, I should like to discuss what would happen ifwe decided to change the nature of the black box. Sup-pose that, instead of being able to recognize the rightsolution, the black box is much more powerful and candetermine whether the individual bit values coincidewith the bit values of the solution. So, for all k ,

ui0i1¯ik ¯in&uj0i j ¯jk¯jn&

→~21 !d ikjkui0i1¯ik¯in&uj0i j¯jk¯jn& , (92)

where i5i0i1¯ik¯in and j5j0i j¯jk¯jn are the binaryrepresentations of i and j , respectively. It can then easilybe checked that this gate has the power to correlate theC and the M register by the amount of log 2. Thereforethe search algorithm would take log N steps (instead ofAN), i.e., it would be polynomial instead of exponential.There is, of course, a hidden complexity here, which is inthe construction of the new black box from the originalblack box. It can be shown that this requires an expo-nential increase in time (or space, which can always betraded for time), and this then compensates for the ex-ponential decrease in the number of applications of thenew black box. In fact, this new black box would beequivalent to the ancient Greek oracle’s being able toanswer the question posed by Socrates: ‘‘Who is the wis-est person in the world?’’

Can we use entropic measures of the above form toquantify the complexity of other quantum algorithms?The answer is unclear at present. The only algorithmthat currently achieves an exponential speedup over itsclassical counterpart, Shor’s factorization algorithm(Shor, 1996), cannot be usefully rephrased in terms ofblack-box operations (more precisely, it is rather trivial,as it requires only one black-box operation). However,this does not prevent us from deriving fundamentalbounds on information storage and the speed of its pro-cessing based on the uncertainty principle. In the lastsubsection, I show the ultimate limits of processingpower no matter what model of computation is used, solong as it uses quantum systems (particles or fieldsalike).

E. Quantum computation and quantum measurement

I now show that quantum computation is formallyidentical to a quantum measurement as described byvon Neumann (1955). The analysis will be performed inthe most general continuous case. Suppose that we havea system S (described by a continuous variable x) and anapparatus A (described by a continuous variable y inter-acting via a Hamiltonian H5xp), where p is the mo-mentum of A (we shall assume that \51). Suppose inaddition that the initial state of the total system is

uC~0 !&5Exf~x !ux&dx ^ h~y !uy&dy

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

in an uncorrelated state. The action of the above Hamil-tonian then transforms the state into an entangled state.In order to calculate this transformation it will be ben-eficial to introduce the (continuous) Fourier transform

Fy :uy&→E e2iypup&dp ,

which takes us from the position space of A into themomentum space of A . This is important because weknow the effect of the Hamiltonian in the momentumbasis. Now, the action of the unitary transformation gen-erated by H is

uC~ t !&5e2ixptuC~0 !&

5Fye2ixptFyuC~0 !&

5ExE

yf~x !h~y2xy !ux&uy&dxdy ,

and we see that S and A are now correlated in x and y .This means that by measuring A we can obtain someinformation about the state of S . The mutual informa-tion IAS5H(x)1H(y)2H(x ,y) can be shown to sat-isfy (Everett, 1973)

IAS>ln t ,

i.e., it is growing at a rate faster than the logarithm oftime passage during the measurement. This gives us alower bound to exactly how quickly correlations can beestablished between the system and the apparatus. Thisis analogous to the way in which I derived the upperbound on the efficiency of quantum search algorithms inSec. V.

Let us now calculate in greater detail the effect of themeasurement Hamiltonian. We define

j~p !ªFy$h~y !%.

The evolution then proceeds as follows:

uC~ t !&5e2xptExf~x !ux&dx ^ h~y !uy&dy

5e2xptExf~x !E

pH E

yh~y !e2iypdyJ ux&up&dxdp

5e2xptExE

pf~x !j~p !ux&up&dxdp

5Exf~x !E

yH E

pj~p !e2ixpteiypdpJ ux&uy&dxdy

5ExE

yf~x !h~y2xy !ux&uy&dxdy .

This result has the same formal structure as the quantumalgorithms presented earlier: a Fourier transform fol-lowed by a conditional phase shift and then followed byanother Fourier transform (cf. Deutsch’s and Grover’salgorithms). Therefore we can see that how efficientlywe can measure something is the same as how efficiently

Page 35: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

231V. Vedral: Relative entropy in quantum information theory

we can compute, both of which depend on how quicklywe can establish correlations.

F. Ultimate limits of computation: The Bekenstein bound

Given a computer enclosed in a sphere of radius Rand having available the total amount of energy E , whatis the amount of information that it can store and howquickly can this information be processed? The Holevobound gives us the ultimate answer. The amount of in-formation that can be written into this volume isbounded from the above by the entropy, i.e., the numberof distinguishable states that this volume can support. Ishall now use a simple, informal argument to obtain thisultimate bound (Tipler, 1994), but the rigorous deriva-tion can be found in Bekenstein (1981). The bound onenergy implies a bound on momentum, and the totalnumber of states in the phase space is

N5PR

DPDR<

PR

\,

where the inequality follows from the Heisenberg uncer-tainty relation DPDR>\ , which limits the size of thesmallest volume in the phase space to \ in each of thethree spatial directions. From relativity we have that forany particle p<E/c , so that

I<ln N<N<E

c

R

\<

ER

\c,

which is known as the Bekenstein bound. In reality thisinequality will most likely be a huge overestimate, but itis important to know that no matter how we encodeinformation we cannot perform better than is given byour most accurate present theory—quantum mechanics.As an example consider the nucleus of a hydrogen atom.According to the above result it can encode about 100bits of information (I assumed that E5mc2 and that R510215 m). At present, NMR quantum computationachieves ‘‘only’’ one bit per nucleus (and not pernucleon)—spin up and spin down being the two states.

From the Bekenstein bound we can derive a bound onthe efficiency of information processing. Again my deri-vation will be loose, and a much more careful calcula-tion confirms what I shall present (Bekenstein, 1984).All the bits in the volume V cannot be processed fasterthan it takes light to travel across the volume V54/3pR3, which is 2R/c . This gives

dI

dt<

E

2\.

Again a hydrogen nucleus can process 1024 bits per sec-ond, which is also in sharp contrast with NMR quantumcomputation where a NOT gate takes roughly a few mil-liseconds, leading to a maximum of 103 bits per second.

The Bekenstein bound shows that there is a poten-tially great number of underused degrees of freedom inany physical system. This provides hope that quantumcomputation will be an experimentally realizable goal.At present, there are a number of different practicalimplementations of quantum computation, but none of

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

them can store and manipulate more than ten qubits at atime [five was the largest number (Vandersypen, 2000)that had been manipulated in a genuine quantum com-putation process at the time this review was finished inthe summer of 2000]. The above calculation, however,does not take into account the environmental influenceon computation nor the experimental precision. I havetouched not at all on the practical possibility of buildinga quantum computer. This is partly for reasons of space,partly because it would spoil the flow of exposition, andpartly because there are already a number of excellentreviews of this subject (Steane, 1997; Cory et al., 2000).It is generally acknowledged that the difficulties inbuilding a quantum computer are only of a practical na-ture and there are no fundamental limits that prohibitsuch a device. I hope that this section offers convincingarguments that building a quantum computer is verymuch a worthwhile adventure, from both the technologi-cal and the fundamental perspective. In any case we seethat there is a great deal of currently unused potential inphysical systems for storing and encoding information.As our level of technology improves we shall find moreand more ways of getting close to the Bekenstein bound.

VI. CONCLUSIONS

We have seen how the distinguishability of differentphysical states is at the heart of information processing,which we quantified using the relative entropy. The rela-tive entropy told us about the possibility of confusingtwo probability distributions, or, in the quantum case,two density matrices. We have seen that relative entropynever increases under any general quantum evolution,meaning that states can become only less distinguishableas time progresses. The most important consequence ofthis was shown to be the Holevo bound, which is thebound on the capacity for classical communication usingquantum states. This basically told us that n qubits can-not store more than n classical bits of information.While this appears to be a severe limitation on quantuminformation processing, with the aid of dense codingquantum communication is in some sense more efficientthan its classical counterpart. Dense coding involves theuse of entangled states, and I therefore showed how thequantum relative entropy can be used to quantify en-tanglement. Moreover, I used the Holevo bound to putlimits on the efficiency of quantum computation bytreating it as a communication protocol. Quantum algo-rithms were shown to be considerably more efficient forsome problems than classical algorithms. In particular, Ihave shown in a new way that the quantum databasesearch has a square-root enhancement in efficiency overthe classical database search. The efficiency of quantumcomputation stems from the tradeoff between two oppo-site effects: on the one hand, superpositions allow us tocompute in parallel, while on the other hand, the Holevobound limits the amount of information we can extractfrom a quantum state. I also emphasized links betweenblack-box quantum computation and quantum measure-ment and I showed that there is a fundamental limit to

Page 36: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

232 V. Vedral: Relative entropy in quantum information theory

deleting information, leading to Landauer’s principlethat one bit erased increases the environment informa-tion by kB ln 2.

With every new physical theory comes a new under-standing of the world we live in. Through Newtonianphysics we understood the universe as a clockworkmechanism. With the subsequent development of ther-modynamics the universe became a big Carnot engine,slowly evolving towards its final equilibrium state afterwhich no useful work could be obtained—the heatdeath. At present we see the universe as an information-processing machine—a computer. Limits to the amountof information it can contain and process are given bythe most accurate theory we have, quantum mechanics,giving rise to quantum information theory.

If there is a single moral to be drawn from the rela-tionship between information and physics it is that, aswe dig deeper into the fundamental laws of physics, wealso push back the boundaries of information process-ing. It will not be surprising if all the results presented inthis review are superseded by higher-level generaliza-tions of which they become approximations in the sameway in which classical information theory today approxi-mates quantum information theory.

ACKNOWLEDGMENTS

I would like to thank S. Bose, D. Deutsch, D. P. Di-Vincenzo, M. J. Donald, A. Ekert, P. Hayden, L. Hend-erson, J. A. Jones, E. Kashefi, P. L. Knight, G. Lindblad,M. Murao, M. Ozawa, M. B. Plenio, L. Rallan, B. Schu-macher, and G. Vidal for many stimulating discussionson the subject of quantum information. In particular, Ithank L. Rallan for a very thorough reading of thismanuscript and for helping me to draw some of the fig-ures. Work for this review has been supported by theKnight Trust and Overseas Research Scheme Award,Elsag-Bailey spa, the Hewlett-Packard Company, Engi-neering and Physical Sciences Research Council, and theEuropean Union project EQUIP (contract IST-1999-11053).

REFERENCES

Ambainis, A., 2000, in Proceedings of the 32nd Annual ACMSymposium on the Theory of Computing, May 2000 (Associa-tion for Computing Machinery, New York), p. 636.

Araki, H., and E. H. Lieb, 1970, Commun. Math. Phys. 18, 160.Barnum, H., M. A. Neilsen, and B. Schumacher, 1998, Phys.

Rev. A 57, 4153.Beals, R., H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf,

1998, Proceedings of the 39th Annual Symposium on Founda-tions of Computer Science (IEEE Computer Society, LosAlamitos, CA), p. 352; also quant-ph/9802049.

Bekenstein, J. D., 1981, Phys. Rev. D 23, 287.Bekenstein, J. D., 1984, Phys. Rev. D 30, 1669.Bell, J., 1987, Speakable and Unspeakable in Quantum Me-

chanics (Cambridge University, Cambridge).Bennett, C. H., 1988, IBM J. Res. Dev. 32, 16.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

Bennett, C. H., E. Bernstein, G. Brassard, and U. Vazirani,1997, SIAM J. Comput. 26, 1510.

Bennett, C. H., H. J. Bernstein, S. Popescu, and B. Schuma-cher, 1996, Phys. Rev. A 53, 2046.

Bennett, C. H., G. Brassard, C. Crepeau, R. Jozsa, A. Peres,and W. K. Wootters, 1993, Phys. Rev. Lett. 70, 1895.

Bennett, C. H., D. P. DiVincenzo, C. A. Fuchs, T. Mor, E.Rains, P. W. Shor, J. A. Smolin, and W. K. Wootters, 1999,Phys. Rev. A 59, 1070.

Bennett, C. H., D. P. DiVincenzo, T. Mor, P. W. Shor, J. A.Smolin, and B. M. Terhal, 1999, Phys. Rev. Lett. 82, 5385.

Bennett, C. H., D. P. DiVincenzo, J. A. Smolin, and W. K.Wootters, 1996, Phys. Rev. A 54, 3824.

Bennett, C. H., and S. Wiesner, 1992, Phys. Rev. Lett. 69, 2881.Bernstein, E., and U. Vazirani, 1993, in Proceedings of the 25th

Annual ACM Symposium on the Theory of Computing(ACM, New York), p. 11.

Bhatia, R., 1997, Matrix Analysis (Springer-Verlag, Berlin).Boschi, D., S. Branca, F. DeMartini, L. Hardy, and S. Popescu,

1998, Phys. Rev. Lett. 80, 1121.Bose, S., M. B. Plenio, and V. Vedral, 2000, J. Mod. Opt. 47,

291.Bose, S., L. Rallan, and V. Vedral, 2000, Phys. Rev. Lett. 85,

5448.Bouwmeester, D., J. W. Pan, K. Mattle, M. Eibl, H. Weinfurter,

and A. Zeilinger, 1997, Nature (London) 390, 575.Bowen, G., 2001, Phys. Rev. A 63, 022302.Brillouin, L., 1956, Science and Information Theory (Aca-

demic, New York).Bures, D., 1969, Trans. Am. Math. Soc. 135, 199.Caves, C. M., and P. D. Drummond, 1994, Rev. Mod. Phys. 66,

481.Choi, M. D., 1975, Linear Algebr. Appl. 10, 285.Chuang, I. L., N. Gershenfeld, and M. Kubinec, 1998, Phys.

Rev. Lett. 80, 3408.Clauser, J. F., M. A. Horne, A. Shimony, and R. A. Holt, 1969,

Phys. Rev. Lett. 23, 880.Cleve, R., A. Ekert, C. Macchiavello, and M. Mosca, 1997,

Philos. Trans. R. Soc. London, Ser. A 454, 339.Cory, D. G., R. Laflamme, E. Knill, L. Viola, T. F. Havel, N.

Boulant, G. Boutis, E. Fortunato, S. Lloyd, R. Martinez, C.Negrevergne, M. Pravia, Y. Sharf, G. Teklemariam, Y. S.Weinstein, and W. H. Zurek, 2000, Fortschr. Phys. 48, 875.

Cover, T. M., and J. A. Thomas, 1991, Elements of InformationTheory (Wiley, New York).

Csiszar, I., and J. Korner, 1981, Coding Theorems for DiscreteMemoryless Systems (Academic, New York).

Davies, E. B., 1976, Quantum Theory of Open Systems (Aca-demic, London).

Deutsch, D., 1985, Proc. R. Soc. London, Ser. A 400, 97.Deutsch, D., 1989, Proc. R. Soc. London, Ser. A 425, 73.Deutsch, D., 1998, The Fabric of Reality (Viking-Penguin, Lon-

don).Deutsch, D., and R. Jozsa, 1992, Proc. R. Soc. London, Ser. A

439, 553.DiVincenzo, D. P., C. A. Fuchs, H. Mabuchi, J. A. Smolin, A.

Thapliyal, and A. Uhlmann, 1999, Quantum Computing andCommunications, Lecture Notes in Computer Science(Springer, Berlin), Vol. 1505, p. 247.

DiVincenzo, D. P., P. W. Shor, J. A. Smolin, B. M. Terhal, andA. V. Thapliyal, 2000, Phys. Rev. A 61, 062312.

Donald, M. J., 1986, Commun. Math. Phys. 105, 13.

Page 37: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

233V. Vedral: Relative entropy in quantum information theory

Donald, M. J., 1987, Math. Proc. Cambridge Philos. Soc. 101,363.

Donald, M. J., 1992, Found. Phys. 22, 1111.Donald, M. J., and M. Horodecki, 1999, Phys. Lett. A 264, 257.Einstein, A., B. Podolsky, and N. Rosen, 1935, Phys. Rev. 47,

777.Eisert, J., T. Felbinger, P. Papadopoulos, M. B. Plenio, and M.

Wilkens, 2000, Phys. Rev. Lett. 84, 1611.Ekert, A., and R. Jozsa, 1996, Rev. Mod. Phys. 68, 733.Everett, H., III, 1957, Rev. Mod. Phys. 29, 454.Everett, H., III, 1973, in The Many-Worlds Interpretation of

Quantum Mechanics, edited by B. S. DeWitt and N. Graham(Princeton University, Princeton, NJ), p. 3.

Fannes, M., 1973, Commun. Math. Phys. 31, 291.Feynman, R. P., 1996, Feynmann Lectures on Computation, ed-

ited by A. J. G. Hey and R. W. Allen (Addison-Wesley, Read-ing, MA).

Fuchs, C. A., 1996, Distinguishability and Accessible Informa-tion in Quantum Theory, Ph.D. thesis (University of NewMexico); also e-print quant-ph/9601020.

Furusawa, A., J. L. Sorensen, S. L. Braunstein, C. A. Fuchs, H.J. Kimble, and E. S. Polzik, 1998, Science 282, 706.

Garey, M., and D. Johnson, 1979, Computers and Intractability:A Guide to the Theory of NP-completeness (Freeman, SanFrancisco).

Garisto, R., and L. Hardy, 1999, Phys. Rev. A 60, 827.Gisin, N., 1996, Phys. Lett. A 210, 151, and references therein.Gordon, J. P., 1964, in Quantum Electronics and Coherent

Light, Proceedings International School of Physics ‘‘EnricoFermi,’’ Course 31, edited by P. A. Miles (Academic, NewYork), p. 156.

Grover, L. K., 1996, Proceedings of the 28th Annual ACMSymposium on the Theory of Computing (ACM, New York),p. 212; also e-print quant-ph/9605043.

Hausladen, P., R. Jozsa, B. Schumacher, M. Westmoreland, andW. K. Wootters, 1996, Phys. Rev. A 54, 1869.

Hayashi, M., 2001, J. Phys. A 34, 3413.Hayden, P. M., M. Horodecki, and B. M. Terhal, 2001, J. Phys.

A 34, 3413.Henderson, L., and V. Vedral, 2000, Phys. Rev. Lett. 84, 2263.Hiai, F., and D. Petz, 1991, Commun. Math. Phys. 143, 99.Hogg, T., and B. A. Huberman, 1983, Phys. Rev. A 28, 22.Holevo, A. S., 1973, Probl. Peredachi Inf. 9, 3. [Probl. Inf.

Transm. 9, 177 (1973)].Holevo, A. S., 1982, Probabilistic and Statistical Aspects of

Quantum Theory (North-Holland, Amsterdam).Holevo, A. S., 1998, IEEE Trans. Inf. Theory 44, 269.Horodecki, M., 1998, Phys. Rev. A 57, 3364.Horodecki, M., P. Horodecki, and R. Horodecki, 1996, Phys.

Lett. A 223, 1.Horodecki, M., P. Horodecki, and R. Horodecki, 1998, Phys.

Rev. Lett. 80, 5239.Horodecki, M., P. Horodecki, and R. Horodecki, 2000, Phys.

Rev. Lett. 84, 2014.Horodecki, M., P. Horodecki, and R. Horodecki, 2001, Phys.

Lett. A 283, 1.Ingarden, R. S., 1976, Rep. Math. Phys. 10, 43.Ingarden, R. S., A. Kossakowski, and M. Ohya, 1997, Informa-

tion Dynamics and Open Systems—Classical and QuantumApproach (Kluwer Academic, Dordrecht).

Jaynes, E. T., and F. W. Cummings, 1963, Proc. IEEE 51, 89.Jones, J. A., and M. Mosca, 1998, J. Chem. Phys. 109, 1648.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

Jones, J. A., M. Mosca, and R. H. Hansen, 1998, Nature (Lon-don) 393, 344.

Jozsa, R., and B. Schumacher, 1994, J. Mod. Opt. 41, 2343.King, C., and M. B. Ruskai, 2001, J. Math. Phys. 42, 87.Kolmogorov, A. N., 1950, Foundations of the Probability

Theory (Chelsea, New York).Kraus, B., J. I. Cirac, S. Karnas, and M. Lewenstein, 2000,

Phys. Rev. A 61, 062302.Kraus, K., 1983, States, Effects and Operations: Fundamental

Notions of Quantum Theory, Lecture Notes in Physics No.180 (Springer, Berlin).

Kullback, S., and R. A. Leibler, 1951, Ann. Math. Stat. 22, 79.Landauer, R., 1961, IBM J. Res. Dev. 5, 183.Lebedev, D. S., and L. B. Levitin, 1963, Sov. Phys. Dokl. 8, 377.Lesniewski, A., and M. B. Ruskai, 1999, J. Mod. Phys. 40, 5702.Lewenstein, M., D. Bruss, J. I. Cirac, B. Kraus, M. Kus, J.

Samsonowicz, A. Sanpera, and R. Tarrach, 2000, J. Mod. Opt.47, 2481.

Lieb, E. H., and M. B. Ruskai, 1973, J. Math. Phys. 14, 1938.Lindblad, G., 1974, Commun. Math. Phys. 39, 111.Lindblad, G., 1975, Commun. Math. Phys. 40, 147.Lo, H., and S. Popescu, 1999, Phys. Rev. Lett. 83, 1459.Mackey, G. W., 1963, Mathematical Foundations of Quantum

Mechanics (W. A. Benjamin, New York).Nielsen, M., 1999, Phys. Rev. A 61, 064301.Ohya, M., and D. Petz, 1993, Quantum Entropy and Its Use,

Texts and Monographs in Physics (Springer-Verlag, Berlin).Ozawa, M., 1984, J. Math. Phys. 25, 79.Papadimitriou, C. H., 1995, Computational Complexity

(Addison-Wesley, New York).Partovi, M. H., 1989, Phys. Lett. A 137, 445, and references

therein.Penrose, O., 1973, Foundations of Statistical Mechanics (Ox-

ford University, Oxford).Peres, A., 1993, Quantum Theory: Concepts and Methods (Klu-

wer Academic, Dordrecht).Peres, A., 1996, Phys. Rev. Lett. 77, 1413.Plenio, M. B., and V. Vedral, 1998, Contemp. Phys. 38, 431.Popescu, S., and D. Rohrlich, 1997, Phys. Rev. A 56, R3319.Rains, E. M., 1999a, Phys. Rev. A 60, 173.Rains, E. M., 1999b, Phys. Rev. A 60, 179.Redhead, M., 1987, Incompleteness, Nonlocality and Realism

(Clarendon, Oxford).Reed, M., and B. Simon, 1980, Methods of Modern Mathemati-

cal Physics-Functional Analysis (Academic, New York).Sanov, I. N., 1957, Mat. Sb. 42, 11.Schmidt, E., 1907, Math. Ann. 63, 433.Schrodinger, E., 1935, Naturwissenschaften 23, 807, 823, 844.Schumacher, B., 1995, Phys. Rev. A 51, 2738.Schumacher, B., 1996, Phys. Rev. A 54, 2614.Schumacher, B., and M. D. Westmoreland, 1997, Phys. Rev. A

56, 131.Schumacher, B. W., and M. D. Westmoreland, 2000, e-print

quant-ph/0004045.Shannon, C. E., and W. Weaver, 1949, The Mathematical

Theory of Communication (University of Illinois, Urbana,IL).

Shor, P. W., 1996, in Proceedings of the 35th Annual Sympo-sium on the Foundations of Computer Science, November1994, edited by S. Goldwasser (IEEE Computer Society, LosAlamitos, CA), p. 124.

Page 38: The role of relative entropy in quantum information theoryyast/Articles/MyArt/RMP00197.pdf · The role of relative entropy in quantum information theory V. Vedral* Centre for Quantum

234 V. Vedral: Relative entropy in quantum information theory

Simon, D. S., 1994, Proceedings of the 35th Annual Symposiumon the Foundations of Computer Science, edited by S. Gold-wasser (IEEE Computer Society, Los Alamitos, CA), p. 16.

Steane, A., 1997, Appl. Phys. B: Lasers Opt. B64, 623.Stern, T. E., 1960, IEEE Trans. Inf. Theory 6, 435.Tipler, F. J., 1994, The Physics of Immortality (Bantam Double-

day Dell, New York).Toffoli, T., 1981, Math. Syst. Theory 14, 13.Tolman, R. C., 1938, The Principles of Statistical Mechanics

(Oxford University, Oxford).Uhlmann, A., 1976, Rep. Math. Phys. 9, 273.Uhlmann, A., 1986, Rep. Math. Phys. 24, 229.Umegaki, H., 1962, Kodaikanal Math. Sem. Rep. 14, 59.Vandersypen, L. M. K., M. Steffen, G. Breyta, C. S. Yannoni,

R. Cleve, and I. L. Chuang, 2000, Phys. Rev. Lett. 85, 5452.Vedral, V., 2000, Proc. R. Soc. London, Ser. A 456, 969.Vedral, V., and M. B. Plenio, 1998, Phys. Rev. A 57, 1619.Vedral, V., M. B. Plenio, K. Jacobs, and P. L. Knight, 1997,

Phys. Rev. A 56, 4452.

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

Vedral, V., M. B. Plenio, M. A. Rippin, and P. L. Knight, 1997,Phys. Rev. Lett. 78, 2275.

Vedral, V., M. A. Rippin, and M. B. Plenio, 1997, J. Mod. Opt.44, 2185.

Vidal, G., 2000, J. Mod. Opt. 47, 355.Vollbrecht, K. G. H., and R. F. Werner, 2001, Phys. Rev. A 64,

062307.von Neumann, J., 1955, Mathematical Foundations of Quantum

Mechanics, translated from the German ed. by R. T. Beyer(Princeton University, Princeton).

Wehrl, A., 1978, Rev. Mod. Phys. 50, 221.Werner, R. F., 1989, Phys. Rev. A 40, 4277.Wootters, W. K., 1998, Phys. Rev. Lett. 80, 2245.Wootters, W. K., and W. H. Zurek, 1982, Nature (London) 299,

802.Yamamoto, Y., and H. A. Haus, 1986, Rev. Mod. Phys. 58,

1001.Yuen, H. P., and M. Ozawa, 1993, Phys. Rev. Lett. 70, 363.Zalka, C., 1999, Phys. Rev. A 60, 2746.