A Quantitative Landauer's Principle

16
A Quantitative Landauer’s Principle Philippe Faist, 1 Fr´ ed´ eric Dupuis, 1 Jonathan Oppenheim, 2 and Renato Renner 1 1 Institute for Theoretical Physics, ETH Zurich, 8093 Switzerland 2 Department for Physics and Astronomy, University College of London, WC1E 6BT London, U.K. (Dated: November 7, 2012) Landauer’s Principle states that the work cost of erasure of one bit of information has a funda- mental lower bound of kT ln(2). Here we prove a quantitative Landauer’s principle for arbitrary processes, providing a general lower bound on their work cost. This bound is given by the min- imum amount of (information theoretical) entropy that has to be dumped into the environment, as measured by the conditional max-entropy. The bound is tight up to a logarithmic term in the failure probability. Our result shows that the minimum amount of work required to carry out a given process depends on how much correlation we wish to retain between the input and the output systems, and that this dependence disappears only if we average the cost over many independent copies of the input state. Our proof is valid in a general framework that specifies the set of possible physical operations compatible with the second law of thermodynamics. We employ the technical toolbox of matrix majorization, which we extend and generalize to a new kind of majorization, called lambda-majorization. This allows us to formulate the problem as a semidefinite program and provide an optimal solution. Introduction.—Landauer’s Principle [1, 2], and more generally the relation between the second law of ther- modynamics and information theory, has received much attention in the past decades. Studies have notably fo- cused on fundamental limits on heat generated by com- putation [2], the exorcism of Maxwell’s demon via infor- mation theory (see eg. [3]), and generalizations to quan- tum settings such as the characterization of entanglement through thermodynamical considerations [4], or the de- termination of the work cost of information erasure with the help of quantum side information [5]. Landauer’s Principle can be stated in the following way. Consider the erasure process of an unknown bit, i.e. the logical operation that resets the bit to a reference state (e.g. zero). Landauer’s Principle asserts that any physical implementation that performs this erasure, us- ing a heat bath at temperature T , has a work cost of at least kT ln(2), where k is the Boltzmann constant. More generally, Landauer noted that all irreversible operations, and not only the erasure of a bit, must cost work due to the transfer of entropy from the information-bearing de- grees of freedom to the environment, which causes the system to dissipate heat. Bennett refined the formula- tion of this principle and showed its relevance in thermo- dynamics (exorcising the Maxwell demon [2, 3]) and in computation [6]. The work cost of thermodynamic processes in the con- text of information theory has been studied for various classical and quantum systems. Szilard [7] originally con- sidered a single-particle gas enclosed in a box with a pis- ton and noted that kT ln(2) work could be reversibly ex- tracted from the gas at the expense of losing the infor- mation about which side of the piston the particle is on. The reverse process corresponds to erasing this informa- tion, bringing the particle on one definite side at kT ln(2) work cost. Landauer [1, 8] studied the example of a par- ticle in a double-V shaped potential, which represents a bit of information, and showed that its erasure costs work. While these results apply to fully unknown bits, the bounds have to be adapted if the system we erase is partially known. In such a case, the average amount of work needed is lower bounded by kT ln(2) H(X), where H(X) is the Shannon entropy of the system X and where the average is taken over many independent repetitions of the erasure process [3, 9, 10]. This result has been derived and extended in several contexts such as using quantum computers performing data compression [11], Hamiltonian models [12] or in a resource theory frame- work [13, 14]. This bound can also be generalized to other processes, for which the average work cost is then given by the amout of entropy the processes transfers into the environment. We refer to Janzing [15, 16] for a proof in a resource theory framework. Generalizations to a single-shot regime, where state- ments are made about individual processes rather than many repetitions of them, have been proposed, for ex- ample in terms of majorization conditions [13], and in terms of entropic quantities which take into account ap- proximate transitions and a probability of failure [1719]. Explicit Hamiltonian models have also been used to study the case of erasure with quantum side informa- tion [5]. It is usually assumed that the system carry- ing the information has a degenerate Hamiltonian. More recently, these thermodynamic considerations have been extended to the case of non-degenerate Hamiltonians [1921], and the majorization condition also adapted to this scenario [17, 19, 21], based on ideas from [2225]. In the present article, we revisit Landauer’s principle in the light of general quantum processes. Our main result is an explicit and rigourous expression for the fundamen- tal minimal work cost of any process E that acts on a system X and brings it from a state σ to a new state ρ. The bound is robust, i.e. it holds even if one tolerates an error probability ε. The work cost W of such a pro- cess is lower bounded by the amount of entropy that has to be dumped into the environment, as measured by the arXiv:1211.1037v1 [quant-ph] 5 Nov 2012

description

Landauer's Principle

Transcript of A Quantitative Landauer's Principle

  • A Quantitative Landauers Principle

    Philippe Faist,1 Frederic Dupuis,1 Jonathan Oppenheim,2 and Renato Renner1

    1Institute for Theoretical Physics, ETH Zurich, 8093 Switzerland2Department for Physics and Astronomy, University College of London, WC1E 6BT London, U.K.

    (Dated: November 7, 2012)

    Landauers Principle states that the work cost of erasure of one bit of information has a funda-mental lower bound of kT ln(2). Here we prove a quantitative Landauers principle for arbitraryprocesses, providing a general lower bound on their work cost. This bound is given by the min-imum amount of (information theoretical) entropy that has to be dumped into the environment,as measured by the conditional max-entropy. The bound is tight up to a logarithmic term in thefailure probability. Our result shows that the minimum amount of work required to carry out agiven process depends on how much correlation we wish to retain between the input and the outputsystems, and that this dependence disappears only if we average the cost over many independentcopies of the input state. Our proof is valid in a general framework that specifies the set of possiblephysical operations compatible with the second law of thermodynamics. We employ the technicaltoolbox of matrix majorization, which we extend and generalize to a new kind of majorization,called lambda-majorization. This allows us to formulate the problem as a semidefinite program andprovide an optimal solution.

    Introduction.Landauers Principle [1, 2], and moregenerally the relation between the second law of ther-modynamics and information theory, has received muchattention in the past decades. Studies have notably fo-cused on fundamental limits on heat generated by com-putation [2], the exorcism of Maxwells demon via infor-mation theory (see eg. [3]), and generalizations to quan-tum settings such as the characterization of entanglementthrough thermodynamical considerations [4], or the de-termination of the work cost of information erasure withthe help of quantum side information [5].

    Landauers Principle can be stated in the followingway. Consider the erasure process of an unknown bit,i.e. the logical operation that resets the bit to a referencestate (e.g. zero). Landauers Principle asserts that anyphysical implementation that performs this erasure, us-ing a heat bath at temperature T , has a work cost of atleast kT ln(2), where k is the Boltzmann constant. Moregenerally, Landauer noted that all irreversible operations,and not only the erasure of a bit, must cost work due tothe transfer of entropy from the information-bearing de-grees of freedom to the environment, which causes thesystem to dissipate heat. Bennett refined the formula-tion of this principle and showed its relevance in thermo-dynamics (exorcising the Maxwell demon [2, 3]) and incomputation [6].

    The work cost of thermodynamic processes in the con-text of information theory has been studied for variousclassical and quantum systems. Szilard [7] originally con-sidered a single-particle gas enclosed in a box with a pis-ton and noted that kT ln(2) work could be reversibly ex-tracted from the gas at the expense of losing the infor-mation about which side of the piston the particle is on.The reverse process corresponds to erasing this informa-tion, bringing the particle on one definite side at kT ln(2)work cost. Landauer [1, 8] studied the example of a par-ticle in a double-V shaped potential, which representsa bit of information, and showed that its erasure costs

    work. While these results apply to fully unknown bits,the bounds have to be adapted if the system we erase ispartially known. In such a case, the average amount ofwork needed is lower bounded by kT ln(2)H(X), whereH(X) is the Shannon entropy of the system X and wherethe average is taken over many independent repetitionsof the erasure process [3, 9, 10]. This result has beenderived and extended in several contexts such as usingquantum computers performing data compression [11],Hamiltonian models [12] or in a resource theory frame-work [13, 14]. This bound can also be generalized to otherprocesses, for which the average work cost is then givenby the amout of entropy the processes transfers into theenvironment. We refer to Janzing [15, 16] for a proof ina resource theory framework.

    Generalizations to a single-shot regime, where state-ments are made about individual processes rather thanmany repetitions of them, have been proposed, for ex-ample in terms of majorization conditions [13], and interms of entropic quantities which take into account ap-proximate transitions and a probability of failure [1719]. Explicit Hamiltonian models have also been usedto study the case of erasure with quantum side informa-tion [5]. It is usually assumed that the system carry-ing the information has a degenerate Hamiltonian. Morerecently, these thermodynamic considerations have beenextended to the case of non-degenerate Hamiltonians [1921], and the majorization condition also adapted to thisscenario [17, 19, 21], based on ideas from [2225].

    In the present article, we revisit Landauers principle inthe light of general quantum processes. Our main resultis an explicit and rigourous expression for the fundamen-tal minimal work cost of any process E that acts on asystem X and brings it from a state to a new state .The bound is robust, i.e. it holds even if one toleratesan error probability . The work cost W of such a pro-cess is lower bounded by the amount of entropy that hasto be dumped into the environment, as measured by the

    arX

    iv:1

    211.

    1037

    v1 [

    quan

    t-ph]

    5 N

    ov 20

    12

  • 2smooth conditional max-entropy [26, 27],

    W > kT ln(2)Hmax (E|X) . (1)Here, the entropy is evaluated for the state which isa purification of the output state obtained by applyingthe process E to a purification of the input state X (seeProposition 3). The entropy measure, Hmax, is part ofthe smooth entropy framework that is widely used insingle-shot quantum information theory [2630]. Its for-mal definition will be given later.

    Our quantitative Landauers Principle is tight up tologarithmic terms in the failure probability of the imple-mentation of the process E . Indeed, we can devise an ex-plicit process carrying out the requested mapping E thatis nearly optimal. This near-optimal process is based onthe scheme proposed by del Rio et al. [5], which erases asystem using available quantum side information.

    Our bound is valid in a general framework that speci-fies the set of physically allowed operations. This frame-work conceptually separates the operations that are in-trinsically thermodynamical (e.g., the erasure of infor-mation) from those that simply correspond to reversibleinformation processing (e.g., unitaries and the additionof ancillas). The former will be those that cost work orthat are capable of extracting work from a system; thelatter are done for free, i.e. at no work cost. We assumethat our systems have a completely degenerate Hamil-tonian. The set of allowed operations is motivated bythe second law of thermodynamics, which forbids cyclicprocesses whose net effect is to extract work.

    For the proofs we use a characterization of our frame-work by a relatively simple and intuitive generalizationof the notion of majorization which is inspired by previ-ous work where the eigenvalues of the input are rescaleduntil the input majorizes the output [17, 19], achievedfor example by appending a work system [21]. We termour generalisation lambda-majorization, and provide amathematical characterization of this notion in terms ofcompletely positive maps that satisfy some normalizationconditions.

    In the asymptotic limit of many identical and identi-cally distributed (i.i.d.) copies of these systems (i.e., theprocess is repeated n independent times, En, onn i.i.d. input states n), we obtain as a corollary of ourmain result a value for the average work cost of erasureper copy,

    W > [H (X) H (X)] kT ln(2) , (2)which is in agreement with the informal formulation ofLandauers principle, that the work cost of any process isdetermined by the decrease of entropy in the information-bearing degrees of freedom (see [16] for a proof in a re-source framework).

    We should point out that the general bound (1) can bearbitrarily larger than the average bound (2). This devi-ation highlights an important feature, namely that corre-lations between the input and the output of the transfor-

    mation play a significant role in the single-shot regime.It is important to not only consider the input and outputstates, but also the whole process, or computation, thatis performed on the actual input. This is natural andgeneralizes the classical case where this consideration isobvious, since a classical computer acts on the actualstate of a register and not on its probability distribution.In the quantum case, we specify the full algorithm (orcomputation) as a completely positive map, which inher-ently tells us which correlations are preserved betweenthe input and output systems. While the transformationof a state into another (e.g. in a resource theoretic ap-proach) is a relevant question, we focus in this paper onthe case where the computation is given, thus fixing allthe correlations that are preserved or destroyed betweenthe input and the output.

    As a simple example, consider X to be a fully mixedqubit, i.e. in the state X =

    1212. Suppose we wish to

    transform this state into another fully mixed qubit again,X =

    1212. There are two obvious processes that achieve

    this goal: we may (a) simply copy the input qubit to theoutput, or (b) throw away the input and prepare a newfully mixed qubit. Both processes (a) and (b) provide therequired output. However, if we had information aboutthe specific state in which the qubit initially was (e.g.suppose we had kept a qubit C that was maximally en-tangled with the input), then in the case of process (a), Cwould remain entangled with the output; however in thecase of (b), C would have lost all correlations with theoutput qubit. In this first example, both processes costno work: (a) is the identity process, and in (b), the workdissipated to erase the qubit is retrieved again when weprepare a new mixed qubit.

    However, the work costs of these processes differ if weconsider less trivial input and output states. Let X bea quantum system composed of n + 1 qubits, in a stateX where the first qubit is randomly zero or one withprobability 1/2, and the n remaining qubits are eitherall zero if the first qubit is zero, or all in a fully mixedstate if the first qubit is one. This state has the distri-bution {1/2, 2(n+1), 2(n+1), . . . 2(n+1)} and is depictedin Figure 1. Suppose that we wish to bring this systeminto the state X = X , i.e. the same state as the inputstate, using either process (a) or (b) again. Process (a)would simply copy the input to its output, and would notcost any work, since it is the identity channel. However,process (b) first has to erase the input state and then pre-pare the output state. If we are lucky, the n qubits are instate |0 . . . 00 . . . 0| (if the first qubit is |0) and we canjust erase the first qubit using kT ln(2) work. However,if we want to erase the system with certainty, we haveto consider the worst case in which we have to erase nfully mixed qubits (which occurs with the non-negligibleprobability 1/2). So the erasure work cost may be as badas (n+1)kT ln(2). In order to prepare this state again asthe output of the process, we may think of tossing a cointo decide in which state |0 . . . 00 . . . 0| or |11|2n12nto prepare X in. If we are lucky, we have to prepare a

  • 3FIG. 1: The probability distribution of a state in which single-shot effects become important, even for large systems. Aregister of n + 1 qubits are in a state such that if the firstqubit is zero (with probability 1/2), then all the rest are zerotoo; if the first qubit is one (with probability 1/2), then allthe rest are in a fully mixed state. The spectrum has a largeeigenvalue (1/2), but also has a large support size (2n + 1); asa consequence, Hmin() 1 and Hmax() n can differ byan arbitrary amount.

    mixed state on n qubits and extract nkT ln(2) work inthe process, but in the worst case, we have to prepare|0 . . . 00 . . . 0| and cant extract more than just kT ln(2)(from the coin toss). Hence, in the worst case, process(b) costs a total of nkT ln(2) work, which can be arbi-trarily larger than the (zero) cost of process (a); in fact,the gap diverges as n.

    This example shows that in the general single-shotregime, the specification of only the input state X andthe output state X does not suffice, and correlations be-tween the input and the output contribute to determinethe minimal work cost of the process (although these cor-relations are not relevant in the asymptotic i.i.d. regime).Our result (1) incorporates this property intrinsically andprovides a bound that is valid for any given process.

    The remainder of this paper is organized as follows.We will first present the mathematical framework usedto model thermodynamic processes. We then introducelambda-majorization, which captures all possible opera-tions in our framework. Lambda-majorization is charac-terized in terms of completely positive maps that satisfysome specific normalization conditions, and we use thischaracterization to derive the main result by formulat-ing the problem as a semidefinite program. The latteris solved by providing optimal primal and dual feasibleplans with the same value, which guarantees optimal-ity of the result. Finally, some special cases are derivedwhich recover some previously known results.

    Framework.Consider a quantum mechanical systemX in an inital state described by the density operator. Our task is to bring the system X to another state, while attempting to maximize some kind of notion ofextracted work in the process. Throughout this pa-per we assume that the Hamiltonians of the systems weconsider are completely degenerate.

    We first postulate two basic operations of thermody-namical nature, involving a heat bath at temperature T :

    the erasure of a single qubit to a pure state at kT ln(2)work cost, and the corresponding reverse process whichextracts kT ln(2) work by transforming a pure state intoa fully mixed state. Here k is the Boltzmann constant.These operations are motivated by the variety of explicitphysical thermodynamical frameworks in which they canbe performed, for example using Szilard boxes [7, 18] orby isothermally manipulating energy levels of Hamilto-nians [5, 12, 20]. Crucially, we assume the second law ofthermodynamics, and require that there exist no opera-tion that would allow us to form a cycle for which thenet effect would be the extraction of work. This justifiesthat no other work extraction procedure can yield morework than kT ln(2) from a pure qubit, or else a cycle withnet work gain could be formed by appending an erasureprocess, itself only costing kT ln(2).

    Apart from this constraint on the set of allowed opera-tions, it is natural to also allow usual quantum informa-tion processing. Since our Hamiltonians are degenerate,we can allow all global unitaries and they cost no work.We do not need to use the fact that these unitaries are im-plementable by a device operating in contact with a heatbath, since expanding the class of allowable operationsactually strengthens the bound we derive. In practice,one has very crude local control over the operations, andthe acting agent does not know which unitary is beingimplemented, however, this is actually not an obstaclefor implementation [11, 14]. In addition to unitaries, wewill allow pure ancillas to be added to the system, whichpermits more general computation. Crucially, ancillaswill have to be restored to their initial pure state, so thatit is not possible to hide a work cost in an ancilla thatwas left mixed.

    The following framework is motivated by the aboveconsiderations. The processes we allow are (finite) com-binations of the following elementary operations:

    (a) Bring n qubits (of the system X or an ancilla A)from any state to a pure state (erasure) at costnkT ln 2 work;

    (b) Bring n qubits (of the system X or an ancilla A)from a pure state to a fully mixed state while ex-tracting nkT ln 2 work;

    (c) Add and remove ancillas in a pure state at no workcost, as long as all the ancillas have been restoredto their initial pure state at the end of the process;

    (d) Perform arbitrary unitaries (over X and any addedancillas) at no work cost.

    Operations (a) and (b) are those of thermodynamicalnature, and may be carried out in a wide range of existingframeworks as mentioned above. One may view theseoperations as defining a quantity which we call work.

    On the other hand, operations (c) and (d) are purelyinformation-theoretical. They allow us to perform anyquantum information processing circuit, since we allowpure ancillas to be added. However, there is the condi-tion that randomness may not be disposed of for free,

  • 4namely that ancillas have to be restored to their initialpure states at the end of the process.

    Lambda-Majorization.We will now provide a simplemathematical characterization of all operations allowedin our framework.

    First, note that the operations (a)(d) allow the useof so-called noisy operations [13], which correspond toadding an ancilla system N in a fully mixed state, per-forming a joint unitary, and removing the ancilla. Specif-ically, a noisy operation is composed in our framework offirst an operation of type (c) (adding a pure ancilla of nqubits), followed by an operation of type (b) (extractingnkT ln 2 work from the ancilla making it fully mixed),then one of type (d) (performing the necessary unitaryto carry out the noisy operation), and finally an opera-tion of type (a) (erasing the ancilla back to its pure stateat a work cost nkT ln 2). The total process has a workbalance of zero. This means that we may thus carry outnoisy operations for free within our framework and usethem as building blocks for more complex processes. Inthe following, we deal implicitly with the ancilla N andit should not be confused with further ancillas that willbe added.

    The following result by Horodecki et al. [13] relatesnoisy operations to the mathematical notion of majoriza-tion [31, 32].

    Noisy Operations and Majorization. The transitionon system X from state to state is possible by noisyoperation if and only if .

    Majorization between two (normalized) states captures the fact that is more mixed than , or thatthe eigenvalues of can be written as a mixture of theeigenvalues of . Formally, majorization can be char-acterized by the existence of a unital, trace-preservingcompletely positive map that brings to [3336]. Achannel E is trace-preserving if E (1) = 1 and unital ifE (1) = 1.Proposition 1. Two positive matrices and satisfy if and only if there exists a trace-preserving, unital,completely positive map E satisfying E () = .

    The notion of majorization is discussed in more detailin Appendix A.

    We will now provide some background insight for themeaning of our new concept of lambda-majorization.The idea is to characterize how well a state majorizesa state . Suppose that we have a system X in state Xand we want to bring it to the state X , where X X .In this case, one can simply carry out a noisy operationas described above. Suppose now that we have an ancillaA that is in a fully mixed state, 1A|A| , and suppose that weare fortunate enough for X 1A|A| X |00|A to alsohold (for some pure state |0A on A). Then by apply-ing a joint noisy operation on both systems, this wouldcorrespond to actually erasing the system A for freeduring the transition . We could then say that

    UNITARY

    FIG. 2: Lambda-Majorization corresponds to absorbing a cer-tain amount of randomness from an ancilla during a unitaryoperation. The system X starts in state , and the ancillaA in a state with 1 fully mixed qubits with the remainingqubits pure. The goal is to devise a global unitary that willbring the system X to the state , while leaving the leastpossible number 2 of fully mixed qubits in A. The difference = 12, is the work extracted by the process; if the valueis negative, it corresponds to a work cost. In the main text,we allow a noisy operation instead of a unitary operation,but one could simply add more mixed qubits to the ancilla oneach side and use those to implement a noisy operation witha unitary.

    the randomness of the ancilla A was transferred intosystem X. We will view this type of transition as workextraction on system X during a transition X X .

    In another situation, it might be that X X . How-ever, in that case, for a large enough ancilla A the ma-jorization X |00|A X 1A|A| will hold. The cor-responding noisy operation then leaves us with a mixedancilla that started off pure; we will view such a transi-tion on system X as costing work.

    Such operations can be performed within our frame-work, using operations (a)(d). In particular, the rela-tion to work is given by elementary erasure and workextraction (operations (a) and (b)) applied to the ancillaA after the transition to restore it to its initial state.

    In general, the ancilla A may start with 1 mixedqubits and end up with 2 mixed qubits after a noisyoperation; we consider in this case to have extracted(1 2) kT ln(2) amount of work. This situation is de-picted in Figure 2. Both considerations above aboutwork cost and work extraction are encompassed, sim-ply because we count the difference in the amount ofrandomness present in the ancilla before and after theprocess. This is the idea behind the concept of lambda-majorization, whose definition we can now state.

    Lambda-Majorization. For two density operators X ,Y on two systems X and Y , we will say that X -

    majorizes Y , denoted by X Y , if there exists a

    (large enough) ancilla system A, as well as 1, 2 > 0with = 1 2, such that

    21121 X 22122 X ,

    where 21121 and 22122 are fully mixed states on1 (respectively 2) qubits of A, and where the remainingqubits of A in each case are pure.

  • 5An expression for by how much a state majorizes an-other was originally introduced in [17] and used in [19], inthe context of work extraction games from Szilard boxes.Their measure, the relative mixedness between and

    , corresponds to the optimal such that .

    Lambda-majorization captures the possible processes

    that are allowed in our framework. Indeed, if ,

    then one has 21121 22122 for some1, 2 with = 1 2. Hence, there exists a noisy op-eration (itself a combination of operations (a)(d) withzero total work cost) that performs the transition from21121 to 22122 . The 1 mixed qubits thatwe have appended to can be created by appending alarge pure ancilla (operation (c)), and using operation(b) to extract 1 kT ln(2) work from 1 qubits, render-ing them fully mixed. At the end of the process, afterthe noisy operation, we need to restore the ancilla in apure state; we thus need to erase (operation (a)) the re-maining 2 qubits, costing 2 kT ln(2) work. The totalextracted work is then (1 2) kT ln(2) = kT ln(2).Conversely, each individual operation (a)(d), individu-ally transforming some state into a state and costing

    work W , implies the lambda-majorization with

    W = kT ln(2). This is clear for operations (c) and(d). For operations (a) and (b), this follows from resultsderived in Appendix A 3.

    The ancilla system above may be viewed as some kindof information battery, as was suggested by Bennett [2]who suggested using a blank memory tape as fuel toextract work. In this case, the ancilla can be used as astorage of purity (or as a storage for mixedness orrandomness which we would like to get rid of), whichis increased or decreased by processes like the ones sug-gested above.

    It turns out that one can characterize lambda-majorization by the existence of a completely positivemap satisfying some special normalization conditions,analogously to Proposition 1.

    Proposition 2. Two normalized density matrices X

    and Y on two systems X and Y satisfy X Y if

    and only if there exists a completely positive map TXYsatisfying Y = TXY (X), such that T XY (1X) 6 1Yand TXY (1X) 6 21Y .

    A channel TXY that satisfies the two last conditionswill be referred to as a lambda-majorization channel.

    Furthermore, although the channel T is not directlya physical channel (it can be, for example, trace-decreasing), it can always be viewed as part of a uni-tal channel E , in the sense that T can be obtained byprojection onto specific subspaces and tracing out theancilla A of the channel E (see Appendix A 2). In turn,unital channels are a (strict [37]) superset of the noisyoperations. Recall that our task is to find a lower boundon the work cost of all possible processes allowed in ourframework, which we will do by optimizing the work costover all processes that perform a given state transition.

    FIG. 3: Our main result gives a fundamental lower bound onthe work cost W of a process transforming a state X (puri-fied by a ficiticious |XR) into a new state XR obtained byapplying a process EXX . The lower bound to the work costis given by the entropy that the process E has to dump intothe environment E (in which XR is purified), as measuredby the Renyi-zero conditional entropy H0 (E|X).

    However, instead of considering only the unital channelsE that are noisy operations, we will relax this last condi-tion and consider all unital channels E , and thus allow theoptimization to range over all T that satisfy the condi-tions of the above proposition. This will make our lowerbound even stronger, by showing that the lower boundstill holds even if we relax somewhat the assumptions inour framework.

    Main Result.We are now ready to derive our mainresult. Consider a system X in the state X . This systemcan always be purified by a reference system, R, in a purejoint state |XR.

    Allowing actions defined by our framework on X, wewill study the transition of this state to a state XR, byapplying a process TXX . The systems are depicted inFigure 3.

    The task we would like to solve is the following. GivenX and a process EXX , and given a purification |XRof X and an output state XR = E (XR), we would liketo find the least amount of work W one has to pay forany process in our framework that implements the actionof E on . As we have seen in the previous section, wecan formulate within our framework all possible processesas lambda-majorizations, so our task is actually to find

    the best such that X X , with the corresponding

    lambda-majorization channel T from Prop. 2 satisfyingT (XR) = XR.

    Our main result gives an upper bound on the optimalamount of work that can be extracted by this transition,or equivalently, a lower bound on the minimum amountof work that will have to be paid in order to perform thetransition. The main result follows directly from follow-ing technical proposition.

    We are given an input state X and a process EXX .Let |XR be a purification of X , and let XR =EXX (XR). Let also XRE be a purification of XRin an environment system E. The Renyi-zero entropy

  • 6H0 (E|X) [26, 38] is defined by

    H0 (E|X) = maxX>0trX=1

    tr [XE X ] , (3)

    where XE is the projector on the support of XE .

    Proposition 3. Then the -majorization X X

    holds, with the channel TXX from Prop 2 satisfyingT (XR) = XR, if and only if 6 H0 (E|X) .Main Result. Any process in our framework acting onsystem X that implements the channel E when given in-put X (or equivalently, that brings the state XR to thestate XR) has to cost at least kT ln(2) H0 (E|X) work.

    In other words, the minimal work cost of a transitionfrom to is given by the amount of (information-theoretic) entropy dumped into the environment, condi-tioned on the output of the computation. This is pre-cisely the quantitative generalization to correlated quan-tum systems of the original Landauers principle [1].

    It is worth noting that instead of specifying the chan-nel E , we may also simply specify the output state XR,which completely determines the process (on the supportof X) since it is the Choi-Jamio lkowski state correspond-ing to E rescaled by X (XR = E (XR)). One can thusunderstand the input to the problem to actually be a bi-partite state XR, such that X is the required output,R is the input that will be fed into the process, and anycorrelations between X and R specify parts of the out-put that we wish be preserved and not be modified, orthermalized, by the process.

    The full proof of Prop. 3 is provided in the appendix.We provide the general idea of the proof in the following.

    Proof Sketch of the Main Result. The main idea of theproof is to write the optimization problem as a semidefi-nite program for the variables = 2, TXX (the Choi-Jamio lkowski representation of TXX), and the dualvariables X , XX and ZXR. Let ()tX denote the partialtranspose operation on X. The optimal extracted work is given by the following semidefinite program:

    Primal

    minimize:

    subject to:

    trX [TXX ] 6 1X : XtrX [TXX ] 6 1X : XX

    trX[TXX

    tXXR

    ]= XR : ZXR

    Dual

    maximize: tr (ZXR XR) trXXsubject to:

    trX 6 1trR

    [tXXR ZXR

    ]6 1X X +XX 1X

    The optimal value = 2H0(E|X) is achieved (see

    Appendix B) by the completely positive map TXX =trE

    [VXXE ()V

    ], where VXXE is the partial isom-

    etry with minimal support relating XR to XER (bothbeing purifications of the same R = R).

    While it is clear from the formulation of our problemthat T is already completely determined on the supportof X (expressed by the condition T (XR) = XR), theoptimization over T is done in order to (at least formally)find the optimal action on the complement of the supportof X . Also, the formulation of a lambda-majorizationproblem as a semidefinite program is a more general tool-box that could be used in the case where the mappingis not completely determined and where arbitrary addi-tional semidefinite conditions can be imposed at will.1

    Allowing a Probability of Error. A smooth versionof the result is straightforward to obtain. In this case, weallow the actual process to not exactly implement E , butonly approximate it well. The best strategy to detect thisfailure is to prepare |XR and send X into the process,and then perform a measurement on XR. To ensure theprobability of error does not exceed , the trace distancebetween the ideal output of the process XR and theactual output XR must not exceed . We can apply ourmain result to the approximate process that brings to, and lower bound the work cost of that process by

    W ( ) > H0 (E|X) kT ln(2)> Hmax (E|X) kT ln(2) , (6)

    where the second inequality is shown in [39] and involvesthe max entropy measure Hmax as defined in [27, 28]. Forany > 0, the smooth max entropy H max is defined as

    H max (E|X) = min

    maxX>0

    tr X=1

    logF 2 (EX ,1E X) , (7)

    where the first optimization ranges over all EX such thatF 2 (, ) > 1 2 and where F (, ) = 1 is thefidelity between the quantum states and [40]. Wewrite Hmax to indicate H

    max with = 0.

    If we optimize (6) over all possible channels T thatoutput such XR, we obtain a bound on the extractable

    1 For example, instead of fixing the process with T (XR) = XR,one may have instead required that T (X) = X for given Xand X , not specifying and optimizing over what happens tocorrelations between the input and the output (or, equivalently,one could optimize over XR with fixed reductions X and R).In that case, the semidefinite program can be used to obtainbounds to the optimal value. This also implies that the relativemixedness introduced in [19] can be formulated as a semidefiniteprogram.

  • 7work with a probability of error ,

    W > minXR

    XRHmax (E|X) kT ln(2)

    > minXRE

    XREHmax (E|X) kT ln(2)

    = H max (E|X) kT ln(2) , (8)where the first optimization ranges over all XR suchthat the trace distance 12XR XR1 6 , and wherethe second optimization ranges over all XRE such thatF 2 (XRE , XRE) > 1 2, with =

    2.

    Tightness of the Bound.The bound given in the mainresult is tight up to error terms of the order of log 1 .Indeed, lets consider the following simple process: oneappends a large enough ancilla AE in a pure state tothe input, so that we have our systems in the stateXRAE = |0AE |XR. Let us consider a purification|XRAE of XR. Since the reduced state on R of boththese states are the same, R = R, there exists a unitaryU acting on X AE such that |XRAE = U |XRAE .So we can apply this unitary onto our input at no workcost, and we are left with |XRAE on our systems. Wethen apply the protocol proposed by del Rio et al. [5] onthe system AE , using the system X as a memory we haveaccess to, in order to erase the ancilla AE back to a purestate. Recall that their process acheives this task with-out modifying the reduced state XR, and at a work costkT ln(2)Hmax (AE |R) +O

    (log 1

    ). It is also straightfor-

    ward to note that their protocol can be carried out withinour framework. Thus, up to error terms of the order ofthe logarithm of the error probability, our bound givenby (8) is tight.

    Special Cases.From our main result we can recoverseveral some special cases of specific interest as corollar-ies.

    Von Neumann Limit. As we have seen in the intro-duction, considerable previous work has focused on thelimit cases where many i.i.d. systems are provided. Insuch a case, the process En is applied on n indepen-dent copies of the input n, and outputs n. Say wetolerate a probability of error . We may simply applyour (smoothed) main result to get an expression for ourbound on the work cost,

    W > H max (En|Xn)n kT ln(2) , (9)however it is known that the smooth entropies convergeto the von Neumann entropy in the i.i.d. limit [41],

    lim0

    limn

    1

    nH max (E

    n|Xn)n = H (E|X) , (10)

    which allows us to simplify the expression to

    H (E|X) = H (EX) H (X) = H (X) H (X) ,where the last equality holds because EX and X have

    the same spectrum being both purifications of the sameR = R. We conclude that in the asymptotic i.i.d. case,the work cost of such a process is simply given by thedifference of entropy between the initial and final state,

    W > [H (initial state)H (final state)] kT ln(2) . (11)We emphasize that in this case the exact process is notrelevant, and only the input and output states matter.If one considers the example given in the introductionwith (a) the identity channel and (b) a replacement map,and apply these processes on n independent copies of thedistribution described in Figure 1, then in this regimeboth processes cost no work.

    Erasure of a Quantum System Using a Quantum Mem-ory. Consider the setting proposed in [5], where a systemS is correlated to a system M in a joint state SM , andwhere our task is to erase S while preserving the reducedstate on M and any possible correlations of M with othersystems. Formally, given a purification SMR of SM , weare looking for a process that will bring this state to thestate SMR = |00|S MR, i.e. we require the pro-cess to preserve MR. In [5] a process is proposed thatperforms this task at work cost

    kT ln(2)Hmax (S|M) +O(log 1

    ),

    where Hmax is the smooth max entropy [2729].

    This is a special case of the general case consideredabove, simply by considering X to be the joint systemof S and the memory M , HX = HS HM . Note thatwe have SMR = |00|S MR, purified by |SMRE =|0S |MRE , where |MRE = USE |SMR andUSE is an isometry from S to E.

    Then the bound on the work cost, tolerating a proba-bility of error of at most , is

    W > Hmax(E|SM) kT ln(2)= Hmax(E|M) kT ln(2)= Hmax(S|M) kT ln(2) , (12)

    where the first equality follows because is pure on Sand the second by reversing the isometry U . We canimmediately conclude that, within our framework, anyprocess that performs this erasure has to cost at leastkT ln(2)Hmax(S|M) work. Thus, the process proposedby del Rio et al. is optimal up to logarithmic factors in theerror probability . Note that if we take the memory Mto be trivial i.e. a pure state, then we are in the standardscenario of Landauer erasure on a single system, and wehave W Hmax(S) which is achievable, recovering theresult of [18].

    State Transformation while Decoupling from the Ref-erence System. Another special case that we can de-rive as a corollary is if we consider the process thaterases its input and prepares the required output inde-pendently. This would occur if we required the outputstate to be completely uncorrelated to the reference sys-

  • 8tem R. Being a replacement map, this process impliesthat XR = X R. In this case, any third party Rthat would have been correlated to the input is now com-pletely uncorrelated to the output.

    Again, we may simply apply our main result with theadditional condition that XR = X R. In this case,the purification of XR, XRE , takes a special form dueto the tensor product structure, with the E system splitinto two ER and EX systems (E = ER EX),

    |XRE = |XEX |RER , (13)where |XEX and |RER are purifications of X andR, respectively.

    The lower bound on the work cost W , given by ourmain result and tolerating a probability of error of atmost , then reads

    W > H max (E|X) = H max (ER)| +H max (EX |X)| ,

    where =

    2 and H max is again the smooth max en-tropy. Now, the spectrum of ER is exactly the same asthe spectrum of R by the Schmidt decomposition of |.This in turn has the same spectrum as X also by theSchmidt decomposition of XR and because R = R.It follows that H max(ER) = H

    max(X). Also, by du-

    ality of smooth min and max entropies [27], we haveH max (EX |X)| = H min (EX) = H min (X), whereH min is the smooth min entropy with purified distancesmoothing as defined in Ref. [28]. In consequence,

    W > H max (X) H min (X) . (14)That is, to transform a state to while maximallydecoupling from the reference system, then one has toerase to a pure state (at cost H max (X)), and thenprepare (extracting work H min (X)).

    Example: Erasing Part of the W State.To illustratesome points mentioned above, consider the W state on asystem S, a memory M and a reference system R givenby

    |W SMR = 13

    [|001+ |010+ |100]SMR . (15)

    The reduced states on SM and M are respectively givenby SM =

    13 |0000| + 23 |++| and M = 23 |00| +

    13 |11|, where |+ = 12 (|01+ |10). By symmetry ofthe W state, the reduced state on any two or one qubit(s)have the same form.

    By actions on S and M , we would like to erase S,leading to the final state on S and M given by SM =|00| M . Let us consider two processes that achievethis goal: the first one will preserve correlations with Rbut will cost work, the second will not cost work but willmodify those correlations.

    We may directly apply the special case above concern-ing the erasure of a system conditioned on a memory:

    the fundamental work cost of such an erasure, if one pre-serves correlations with a reference system R, is givenby H0 (S|M). One may explicitely calculate (see Ap-pendix C) in this case H0 (S|M) = log 23 0.59 and thusthis process must cost at least this amount of work.

    However, one may easily notice that both SM andM have the same spectrum {2/3, 1/3}. This meansthat there exists a unitary U that performs the era-sure simply as |00| M = USMU, and this uni-tary process does not cost any work. However, the cor-relations with R are not preserved. Indeed, the uni-tary sends |00 to |01 and |+ to |00, so one ex-plicitely calculates that the state after the process isgiven by SMR = USMRU

    = 13

    [|011+2|000] =|0 1

    3

    [|11+2|00]. We notice that the reducedstate on M and R is now pure and differs from initialone, given by MR =

    13 |0000|+ 23 |++|.

    Conclusion.The last few years have seen enormoustechnological progress in micro- and nano-fabrication,making it possible to construct engines and thermo-devices on a microscopic scale [4250]. In this regime,standard thermodynamic considerations (devised origi-nally for macroscopic devices such as steam engines) arenot necessarily applicable. At the same time, with theminiaturization of computing circuits, thermodynamicaspects of information processing have become increas-ingly relevant. In fact, the heat dissipated by proces-sors is one of the main barriers limiting their perfor-mance. Along with these developments, researchers havestarted to investigate the laws of thermodynamics froman information-theoretic perspective [5157].

    The present work adds to this line of research, provid-ing a rigorous quantitative relationship between informa-tion theory and thermodynamics. One of our main find-ings is that this relationship is more intricate than whatprevious results (which focused on averaged quantities)may have suggested. In particular, it turns out that thethermodynamic cost of a given information-processingtask not only depends on the input and output state,but also on the correlation between them. While thiscorrelation-dependence disappears in certain asymptoticlimits, it cannot be neglected in general and, in fact, maybecome arbitrarily large.

    Acknowledgements.The authors would like to thankJohan Aberg, Ldia del Rio and Joe Renes for manyenlightening discussions. PhF, FD and RR were sup-ported by the Swiss National Science Foundation (SNSF)through the National Centre of Competence in ResearchQuantum Science and Technology and through grantNo. 200020-135048, and by the European Research Coun-cil through grant No. 258932. FD was also supported bythe SNSF through grants PP00P2-128455 and 20CH21-138799, as well as by the German Science Foundation(grant CH 843/2-1). JO is funded by the Royal Societyof London.

  • 9APPENDIX

    Appendix A: Formal Approach toLambda-Majorization

    1. Preliminaries and Main Definition

    Let HX , HY be two subspaces of a finite-dimensionalHilbert space HZ , and let HA, HB be two subspaces ofa finite-dimensional Hilbert space HC . Let d() denotethe dimensions of the various Hilbert spaces H() andspecifically let d = dZ = dimHZ . Denote by L (H ) theset of linear hermitian operators onH , byP(H ) the setof positive semidefinite operators onH , and by S=(H )those operators in P(H ) that have unit trace. Let alsoi() denote the i-th eigenvalue of (in no particular

    order), and i () denote the i-th eigenvalue of takenin decreasing order.

    Majorization is discussed in detail in Refs. [31, 32, 58].

    Majorization. A matrix P(HZ) is said to ma-jorize P(HZ), denoted by , if for all k,ki=1

    i () >

    ki=1

    i (), and if tr = tr .

    The notion of majorization defines a (partial) orderrelation onP(HZ). When considering the set of densitymatrices S=(HZ), there is a least element: the fullymixed state, 1d1Z .

    Weak Submajorization. A matrix P(HZ) is saidto weakly submajorize P(HZ), denoted by w ,if for all k,

    ki=1

    i () >

    ki=1

    i ().

    Remark that if , S=(HZ), then the concept ofweak submajorization is equivalent to regular majoriza-tion simply because the traces of these matrices are al-ready equal to unity.

    Doubly Stochastic Matrix. A dd matrix S is doublystochastic if S ji > 0,

    i S

    ji = 1 j and

    j S

    ji = 1 i.

    Doubly Substochastic Matrix. A n m matrix Bis doubly substochastic if B ji > 0,

    iB

    ji 6 1 j and

    j Bji 6 1 i.

    The following theorem is due to Hardy, Littlewood andPolya [59].

    Theorem 4 (Hardy, Littlewood, and Polya, 1929). Let, P(HZ). Then if and only if there existsa d d doubly stochastic matrix S ji such that i() =j S

    ji j() .

    A similar theorem is obtained for weak submajoriza-tion and doubly substochastic matrices [31].

    Proposition 5. Let P(HX) and P(HY ).Then w if and only if there exists a dY dX doublysubstochastic matrix B ji such that i() =

    j B

    ji j().

    Majorization defines a partial order on states and hasa smallest element, the fully mixed state. Also, a purestate majorizes any other state.

    Proposition 6. Majorization is preserved by direct sumsand tensor products, i.e. if and , then and . The same holds forweak submajorization.

    A proof for the direct sum of two vectors can be foundin [31, Cor. II.1.4]. We provide here an alternative proofalong with the tensor product case.

    Proof. Let S ji and S ji be doubly stochastic matrices such

    that i() =j S

    ji j() and i(

    ) =j S ji j(

    ).Then SS is also doubly stochastic and satisfies i() =

    j(S S) ji j( ), because the vectors of

    eigenvalues of the direct sum are simply the direct sumsof the individual vector of eigenvalues. This shows that .

    Analogously, S S satisfies ii( ) =i()i(

    ) =jj S

    ji j()S

    ji j(

    ) =jj(S

    S) jj

    ii jj(). SS is doubly stochastic,ii(S

    S) jj

    ii =ii S

    ji S

    ji = 1 and

    jj(S S) jj

    ii =

    jj Sji S

    ji = 1.

    The same proof holds for doubly substochastic matri-ces, so majorization may be replaced by weak subma-jorization in the proposition.

    We are now all set for a formal definition of lambda-majorization.

    Let R and let 1, 2 > 0 such that = 12 and21 , 22 are integers. (The case when 2 is irrational willbe discussed later.) Take HC of size greater than both21 and 22 and let HA and HB be subspaces of HC ofrespective dimensions 21 and 22 .

    Lambda-Majorization. For P(HX) and P(HY ), we say that -majorizes , denoted by

    ,if there exists such 1, 2 such that 2

    11A w

    221B . Here 1A, 1B are the projectors onto therespective subspaces HA and HB embedded in HC , ofrespective dimensions 21 , 22 . Likewise, and areconsidered as living in HZ by padding them with zeroeigenvalues as necessary.

    We have assumed here that 2 is rational. If 2 isirrational, we say that -majorizes if for all rational

    2

    with < , then .

    The following proposition guarantees that the defini-tion above does not depend on the exact values of 1and 2 but only on their difference. This is the same assaying that a fully mixed state cannot act as a catalyst.

    Proposition 7. For any , P (HZ), and for any n,we have w if and only if 1nn w 1nn .

  • 10

    Proof. If w , then the majorization passes over thetensor product, and thus proves the claim. Conversely, if 1nn w 1nn , then in particular, for any k 6 d,

    nki=1

    i (1n

    n ) >nki=1

    i (1n

    n ) . (A1)

    (d is the maximum rank of or .) But in(1n

    n ) =1ni () and thus

    ki=1

    i () >ki=1

    i () .

    The following proposition is a direct consequence of thedefinition of lambda-majorization, and just states thatyou can move around randomness into or out of the an-cillas in the definition of lambda-majorization.

    Proposition 8. For any P(HX), P(HY ), andfor any R, n > 0, we have

    1n1n

    +logn and

    logn 1n1n .

    Similarly to Thm. 4 and to Prop. 5, it is possible tocharacterize lambda-majorization by the existence of amatrix relating the vector of eigenvalues that satisfiessome specific normalization conditions.

    Proposition 9. Let P(HX) and P(HY ).Then

    if and only if there exists a dY dX matrixT ki such that i() =

    k T

    ki k(), satisfying T

    ki > 0,

    i Tki 6 1, and

    k T

    ki 6 2 .

    Proof of Prop. 9. Suppose 211A w 221B with = 12. Then there exists a doubly substochas-tic matrix S akbi such that

    bi(221B

    )=ak

    S akbi ak(211A

    ),

    with S akbi > 0,bi S

    akbi 6 1 and

    ak S

    akbi 6 1. (Indices

    a and b refer to the mixed ancillas of respective sizes 21

    and 22 . Since we are considering weak submajorization,we can safely ignore all zero eigenvalues and consider onlythe subspaces (of different sizes on the left and right handside of the majorization) on which , , 1A and 1B havesupport, as in Prop. 5.)

    Now we have

    i()

    =b

    bi(221B

    )=a b k

    S akbi ak(211A

    )=k

    (ab

    21 S akbi

    )k () ,

    so one can define

    T ki =ab

    21 S akbi ,

    which fulfills i()

    =k T

    ki k

    (). Because S is doubly

    substochastic, and using the fact that indices a (resp. b)range to 21 (22), the matrix T satisfies

    i

    T ki =i a b

    21S akbi =a

    21bi

    S akbi 6 1 ,

    as well ask

    T ki =k a b

    21S akbi =b

    21ak

    S akbi

    6b

    21 = 2 .

    Additionally, T ki > 0 because S akbi > 0.

    Conversely, suppose that a matrix T ki exists, withT ki > 0,

    i T

    ki 6 1,

    k T

    ki 6 2, and i() =

    k Tki k(). Let 1, 2 such that = 1 2 and

    such that 21 , 22 are integers. Then let S akbi = 22T ki

    for all a, b. Then S akbi > 0 and S satisfiesak

    S akbi = 22

    ak

    T ki 6 22(

    a

    1)

    2 = 1 ,

    as well asbi

    S akbi = 22

    bi

    T ki 6 22(

    b

    1)

    = 1 .

    The required weak submajorization for the desiredlambda-majorization is provided by this doubly sub-stochastic matrix,

    bi(221B

    )= 22i

    ()

    = 22k

    T ki k()

    = 22k

    T kia

    ak(211A

    )=ak

    S akbi ak(211A

    ).

  • 11

    2. Formulation of Lambda-Majorization in Termsof Channels

    Majorization can also be characterized in terms of uni-tal, trace-preserving completely positive maps [3336].

    Proposition 10. Two positive semidefinite matrices and satisfy if and only if there exists a trace-preserving, unital, completly positive map E satisfyingE () = .

    Similarly, one can prove an analogous characterizationof weak submajorization. The proof of this propositionwill be given later.

    Proposition 11. Let P(HX) and P(HY ).Then w if and only if there exists a completelypositive map EXY : L (HX) L (HY ) such thatEXY () = , with E satisfying EXY (1X) 6 1Y andEXY (1Y ) 6 1X .

    Lets say that EXY is subunital if EXY (1X) 6 1Y .Then the two conditions on the structure of the channelEXY in the above proposition require the channel to besubunital and trace-nonincreasing.

    A subunital trace-nonincreasing completely positivemap can always be seen as part of a unital, trace-preserving completely positive map on a larger Hilbertspace. This is analogous of the result that doubly sub-stochastic matrices are submatrices of stochastic matri-ces [31].

    Proposition 12. Let EZZ be a unital, trace-preservingcompletely positive map. Let HX and HY be two sub-spaces of HZ and let 1X and 1Y be the projector ontothose spaces, respectively. Then the channel E XY () =1Y E (1X 1X)1Y is subunital and trace-decreasing.

    Conversely, let E XY be any trace-decreasing, subuni-tal completely positive map. Let HZ =HX HY , GY =1Y E XY (1X) > 0, and HX = 1X E (1Y ) > 0.Then the channel defined by

    EZZ ()= 0X E XY (1X ()1X)+ E (1Y ()1Y ) 0Y+(

    0X GY

    )()(

    0X GY

    )+(

    HX 0Y)

    ()(

    HX 0Y)

    is unital and trace-preserving, and E XY () =1Y E (1X ()1X)1Y .

    In order to generalize this concept to our lambda-majorization, lets introduce the concept of an -subunital map. These generalize the notion of subunitalmaps to arbitrary normalizations.

    -subunital Maps. Well call a map TXY -subunitalif it satisfies TXY (1X) 6 1Y .

    Proposition 13 (Composition of -subunital maps).Let HW HZ be another subspace of HZ in addi-tion to HX and HY , and let TXY , T YW be trace-nonincreasing maps. Assume that TXY is -subunitaland that T YW is -subunital. Then their composition[T T ]XW is -subunital.Proof of Prop. 13. The composition of TXY and T YWis trace-nonincreasing,

    T (T (1W )) 6 T (1Y ) 6 1X .Their composition is also -subunital,T YW (TXY (1X)) 6 T YW (1Y ) 6 1W .We will now give proofs for Props. 11 and 12, which

    rely on the following lemma.

    Lemma 14. Let TZZ be a trace-nonincreasing mapthat is 2-subunital. Denote by 1X (resp. 1Y ) the pro-jectors onto the subspaces HX (resp. HY ) of HZ . ThenTXY , defined by TXY () = 1Y TZZ (1X ()1X)1Y ,is also a trace-nonincreasing 2-subunital map.

    Proof of Lemma 14. It suffices to note that the projec-tion map: () 1X ()1X (resp. () 1Y ()1Y ) istrace-nonincreasing and subunital. Then apply Prop. 13twice.

    Proof of Prop. 12. The first part of the proposition fol-lows from the lemma. To prove the converse, let EZZ asin the proposition text, and notice first that the channelis its own adjoint:

    E () = E (1Y ()1Y ) 0Y + 0X E (1X ()1X)+(

    0X GY

    )()(

    0X GY

    )+(

    HX 0Y)

    ()(

    HX 0Y)

    = EXY () . (A2)The map is unital:

    EZZ (1Z) = 0X (1Y GY ) + (1X HX) 0Y+ 0X G+HX 0Y = 1Z ,

    and it is thus trace-preserving because of (A2). The lastcondition, E XY () = 1Y EZZ (1X ()1X)1Y is obvi-ous from the definition of EZZ .Proof of Prop. 11. By the weak submajorization condi-tion, if tr 6= tr, we must have tr < tr. Consider anextension space HY HZ (consider a larger HZ if nec-essary) in which we extend by many small eigenvaluessuch that tr YY = tr, while still having w YY .Now we have a (regular) majorization, YY , andcan apply Prop. 10.

    The obtained map, EZZ , is then unital and trace-preserving. It can be restricted by projecting the input

  • 12

    onto HX and the output onto HY ,

    EXY () = 1Y EZZ(1X ()1X

    )1Y .

    This restricted operator, by the lemma, is a valid trace-nonincreasing subunital map (take = 0).

    Conversely, if EXY is a subunital trace-nonincreasingcompletely positive map with EXY () = , thenone can dilate it with Proposition 12 to a uni-tal, trace-preserving completely positive map EZZsuch that 1Y EZZ ( 0Y )1Y = . Note alsothat the map () 7 1Y ()1Y + 1X ()1X is apinching [31, p. 50, Prob. II.5.5], so we have 0Y EZZ ( 0Y ) 1XEZZ ( 0Y )1X +1Y EZZ ( 0Y )1Y w 1XEZZ ( 0Y )1X = .The last weak submajorization is because some eigen-values were left out.

    In the same way as lambda majorization can be charac-terized with differently normalized doubly substochasticmaps, it can also be characterized in terms of a differentlynormalized subunital channel.

    Proposition 15. Let P(HX), P(HY ) and R. Then if and only if there exists acompletely positive map TXY : L (HX) L (HY )such that TXY () = , that is 2-subunital and trace-nonincreasing.

    Proof of Prop. 15. . Assume first that 211A w 221B , with HA, HB (of respective sizes 21and 22) being subsystems of an ancilla systemHC , with = 1 2.

    By Prop. 11, there exists a subunital trace-nonincreasing completely positive map EAXBY , suchthat

    EAXBY (211A ) = 221B . (A3)

    Now let the map T be defined byTXY () = trB

    [EAXBY (211A ())] . (A4)This map is trace-nonincreasing,

    T XY (1Y ) = 21 trA[EAXBY (1BY )

    ]6 21 trA (1AX) = 1X ,

    and 2-subunital,

    TXY (1X) = 21 trB [E (1AX)] 6 21 trB 1BY= 21Y .

    The map T brings to ,

    TXY (X) = trB[E (211A X)]

    = trB(221B Y

    )= Y ,

    so that T satisfies all the claimed properties.. To prove the converse, assume that a trace-

    nonincreasing, 2-subunital map TXY exists, suchthat TXY () = .

    Choose 1, 2 such that = 1 2 and such that21 , 22 , are integers. (Again, in case 2 is irrational,

    approximate 2 arbitrarily well by rational numbers 2.)

    Choose HC large enough to contain two subspaces HAand HB of respective dimensions 21 and 22 . Let

    EAXBY () = 221B TXY (trA ()) . (A5)This map is trace-nonincreasing,

    E (1BY ) = 221A T (trB 1BY )= 221A T

    (221Y

    )6 1AX ,

    and subunital,

    E (1AX) = 221B T (trA 1AX)= 221B T

    (211X

    )6 1BY ,

    since = 1 2 and T is 2-subunital. Also,

    E (211A X) = 221B T (trA (211A X))= 221B T (X) = 221B Y .

    By Prop. 11, we eventually have

    211A X w 221B Y .

    Remark 16. A trace-nonincreasing, 2-subunital com-pletely positive map TXY can always be written as inEq. (A4) for a sub-unital trace-nonicreasing completelypositive map EAXBY , which itself can always be writ-ten as projections of a unital map ECZCZ (see text ofthe previous proof, and Prop. 12).

    Conversely, for any unital map ECZCZ withE (211 X) = 221 Y , in particular for anynoisy operation in our framework, the map T obtainedby Eq. (A4) is trace-nonincreasing and 2-subunital.

    In particular, for our purposes of optimizing over allpossible processes of our framework with an additionalcondition to the channel carrying out the process (namelyto preserve correlations between our system X and thereference system R), we may impose that condition di-rectly on the channel T to obtain an upper bound on.

    3. Properties for quantum states

    We will consider in this section some useful propertiesof lambda-majorization in the case where we considernormalized states , . Here, weak majorization auto-matically implies (regular) majorization because tr =tr = 1.

  • 13

    In this section, let S=(HX) and S=(HY ).

    Proposition 17 (Lambda-Majorizing a Pure State).

    For any pure state |0 HZ , we have |00| if andonly if rank 6 2 (obviously has to be negative orzero). Equivalently, 1n1n if and only if rank 6 n.

    Proof of Prop. 17. Assume first that |00|. Here

    HY is the one-dimensional space spanned by |0, andtake HX the subspace on which has its support. ByProp. 9 there exists a single-row matrix T ki satisfyingT ki > 0,

    i T

    ki = T

    ki=1 6 1 k,

    k T

    ki 6 2 such

    that 1 = i=1(|00|) =k T

    ki=1k(). We also have

    k() 6= 0 because has nonzero eigenvalues in HX .Then

    k T

    ki=1k() = 1 =

    k k() implies T

    ki=1 =

    1 k. That is, the condition k T ki=1 6 2 forces T ki=1to have at most 2 elements, i.e. the rank of may notexceed 2.

    The converse holds because any state majorizes a uni-form state of the same rank.

    Proposition 18 (Condition on Support Sizes for Lamb-

    da-Majorization). If , then rank 6 2 rank .

    Proof of Prop. 18. Notice that 1rank 1rank , andthus

    1rank 1rank . Then, by Prop. 8 we have

    log rank |00| ;

    it remains to apply Prop. 17.

    Proposition 19 (Being Lambda-Majorized by a PureState). Let the state have maximum eigenvalue

    max(). For any pure state |0, we have |00| if and only if max() 6 2. Equivalently, 1n1n ifand only if max() 6 1n .

    Proof of Prop. 19. Let T ki be as in Prop. 9. Note herek only takes value 1, because we consider HY being theone-dimensional space spanned by |0. Then i() =k T

    ki k(|00|) = T k=1i and thus T ki = i(). Then

    2 >k T

    ki = T

    k=1i = i() for all i. In particular,

    2 > max().Conversely, if max() 6 2, then let T k=1i = i().

    This matrix T satisfies the conditions in Prop. 9 and thus

    |00| .

    4. Optimal Lambda Majorization for NormalizedStates and Relation to Single-Shot Entropy

    Measures

    Define the absorbed randomness (or relative mixed-ness [19]) of a transition from to as the maximalamount of randomness that you can get rid of, or the

    minimal amount of randomness that you have to gener-ate, in a noisy operation process:

    R( ) = sup{ : } . (A6)Recent work has shown that this measure is relevant

    for the amount of extractable work of processes acting onarrays of Szilard boxes [19].

    The absorbed randomness has some tight relationsto single-shot entropy measures, which we present here.These are reformulations of results shown in [17, 18].

    Proposition 20. The absorbed randomness definedabove satisfies the following bounds.

    Hmin()H0() 6 R( ) 6 H0()H0() .

    Proposition 21. If |0 denotes any pure state, then thefollowing relations hold:

    R(|0 ) = Hmin() , (A7)R( |0) = H0() . (A8)

    Similar explicit values can be obtained in the casewhere either the initial state or the target state is mixed.

    Proposition 22. If 1nn denotes the fully mixed state onlog n qubits, then:

    R(1nn ) = Hmin() log n , (A9)R( 1nn ) = log nH0() . (A10)

    Proof of Prop. 20. Lower bound: Let 1 = Hmin() = log max() and 2 = H0() = log rank. By Propo-sition 19, we have 21121 and by Proposition 17, 22122 . The majorization carries over to the ten-sor product, 21121 22122 , and 1 2is a valid maximization candidate for (A6).

    Upper bound: Let = R( ) satisfying .Proposition 18 immediately yields 2 6 rank rank , and

    R( ) = 6 log rank log rank .Recalling the definition of the Renyi-0 entropy H0() =log rank yields the required upper bound.

    Proof of Prop. 21. Equation (A8) follows from thebounds of Proposition 20, which become tight in thisspecial case. Equality (A7) is a direct consequence ofProp. 19.

    Proof of Prop. 22. The bounds of Proposition 20 becometight for (A10). Equality (A9) is again a consequenceof Prop. 19, recalling Prop. 8 which allows us to write

    |00| +logn instead of 1nn .

  • 14

    Appendix B: Derivation of the Main Result:Formulation as Semidefinite Program

    Let HX be a quantum system in the state X . LetHR be an additional quantum system and let |XR bea purification of X .

    Suppose we want to bring the system X into a givenstate XR with a lambda-majorization (here XR is notnecessarily pure; giving the joint state with R allows usto specify which correlations we want to preserve). Thetask is then the following.

    Task. Find the best (maximal) , such that there existsa completely positive, 2-subunital, trace-nonincreasingmap TXX satisfying TXX(XR) = XR.

    In other words, we would like to find the trace non-increasing channel that satisfies TXX(XR) = XR,that has the smallest possible TXX (1X).

    This problem can be formulated as a semidefinite pro-gram in terms of the variables (defined as = 2)and TXX (through its Choi-Jamiolkowski map TXX).(See [60, 61] for a introduction to SDPs in a style similarto what we use here.)

    Primal

    minimize:

    subject to:

    TXX (1X) 6 1X : X (B1a)T XX (1X) 6 1X : XX (B1b)

    TXX(XR) = XR . : ZXR (B1c)

    Dual

    maximize:

    tr (ZXR XR) trXXsubject to:

    trX 6 1 (B2a)trR

    [tXXR ZXR

    ]6 1X X +XX 1X . (B2b)

    Note that since the channel does not touch R, we mustnecessarily have R = R. Let E be an environment thatpurifies the output state as XRE . As two purificationswith the same reduced state on R, the two states XRand XR must be related by an isometry VXXE asXRE = VXXE XR V . We can choose VXXE to bea partial isometry such that V V = XE , the projectoron the support of XE , and V

    V = X , the projectoron the support of X .

    Now, define T by its Stinespring dilationTXX () = trE

    [VXXE () V

    ], (B3)

    and let = T (1X). We will show that this choice ofvariables is feasible and optimal, and will derive a moreexplicit value of .

    Condition (B1a) is satisfied by definition and (B1b)because V is a partial isometry. Also, verifying condi-tion (B1c),

    TXX (XR) = trE[VXX XRV

    ]= trE XRE

    = XR . (B4)

    Now calculate

    = T (1X) = trE V V = trE XE= max

    Xtr[XE X

    ]= 2H0(E|X

    ) . (B5)

    We will now show that this value is optimal by exhibit-ing a solution to the dual problem that achieves the samevalue. Let X = X be the optimal X for the defini-tion of H0(E|X ) as in (B5), let ZXR = 1R X andlet XX = 0. This choice is feasible since condition (B2a)is automatically satisfied and condition (B2b) becomes

    trR[tXXR ZXR

    ]= trR

    [tXXR 1R X

    ]= trR

    [tXX|R X

    ]= tXX X 6 1X X , (B6)

    where X|R is a maximally entangled state on the sup-ports of X and R. Let XRE and VXXE be definedas before. The value achieved by this choice of dual vari-ables is then

    tr [ZXR XR] = tr[1R X XR

    ](B7)

    = tr[1R X VXXE XRV

    ](B8)

    = tr[X VXXE X|RV

    ]= tr

    [XXE

    ]= 2H0(E|X

    ) . (B9)

    From this, we conclude that the optimal for thisproblem is

    opt = H0(E|X ) . (B10)where XRE is a purification of XR.

    We note also that this gives the optimal amount of ex-tracted work. Of course, any 6 opt also is a solution.

    Appendix C: Renyi-zero entropy of the W state

    Let S and M be two qubits in the state SM =13 |0000|SM + 23 |++| (where |+ is the Bell state|+ = 1

    2

    [|01 + |10]). Written out explicitely in thebasis {|0, |1},

    SM =

    1/3

    1/3 1/31/3 1/3

    0

    .

  • 15

    (Empty entries are zero.)The projector on its support is

    SM =

    1 1/2 1/21/2 1/20

    .We would like to compute the quantity

    2H0(S|M) = maxM dens. op.

    tr SMM .

    Let M =(s1 s

    2

    s2 1s1

    ); then

    tr [SM (1S M )] = s1 + 12

    (1 s1) + 12s1

    =1

    2+ s1 . (C1)

    Under the constraint 0 6 s1 6 1, this expression is clearlymaximized when s1 = 1, yielding the value

    H0(S|M) = log 32.

    [1] R. Landauer, IBM Journal of Research and Development5, 183 (1961), ISSN 0018-8646.

    [2] C. H. Bennett, International Journal of TheoreticalPhysics 21, 905 (1982).

    [3] C. H. Bennett, Studies in History and Philosophy of Mod-ern Physics 34, 501 (2003).

    [4] J. Oppenheim, M. Horodecki, P. Horodecki, andR. Horodecki, Physical Review Letters 89, 180402(2002), ISSN 0031-9007, 0112074.

    [5] L. del Rio, J. Aberg, R. Renner, O. Dahlsten, and V. Ve-dral, Nature 474, 61 (2011), ISSN 1476-4687.

    [6] C. H. Bennett, IBM Journal of Research and Develop-ment 17, 525 (1973), ISSN 0018-8646.

    [7] L. Szilard, Zeitschrift fur Physik 53, 840 (1929), ISSN1434-6001.

    [8] R. W. Keyes and R. Landauer, IBM Journal of Researchand Development 14, 152 (1970), ISSN 0018-8646.

    [9] K. Maruyama, F. Nori, and V. Vedral, Reviews of Mod-ern Physics 81, 1 (2009), ISSN 0034-6861.

    [10] B. Piechocinska, Physical Review A 61, 062314 (2000),ISSN 1050-2947.

    [11] L. Schulman and U. Vazirani, in Proceedings of the thirty-first annual ACM symposium on Theory of computing(ACM, 1999), pp. 322329.

    [12] R. Alicki, M. Horodecki, P. Horodecki, and R. Horodecki,Open Systems & Information Dynamics 11, 205 (2004),ISSN 12301612.

    [13] M. Horodecki, P. Horodecki, and J. Oppenheim, PhysicalReview A 67, 10 (2003), ISSN 1050-2947.

    [14] F. G. S. L. Brandao, M. Horodecki, J. Oppenheim,J. M. Renes, and R. W. Spekkens, ArXiv e-prints (2011),1111.3882.

    [15] D. Janzing, P. Wocjan, R. Zeier, R. Geiss, and T. Beth,International Journal of Theoretical Physics 39, 2717(2000).

    [16] D. Janzing, Computer science approach to quantumcontrol (Universitatsverlag Karlsruhe, Karlsruhe, 2006),ISBN 3-86644-083-9.

    [17] D. Egloff, Masters thesis, ETH Zurich (2010).

    [18] O. C. O. Dahlsten, R. Renner, E. Rieper, and V. Vedral,New Journal of Physics 13, 53015 (2011), ISSN 1367-2630.

    [19] D. Egloff, O. C. O. Dahlsten, R. Renner, and V. Vedral,ArXiv e-prints (2012), 1207.0434.

    [20] J. Aberg, ArXiv e-prints (2011), 1110.6121.[21] M. Horodecki and J. Oppenheim, ArXiv e-prints (2011),

    1111.3834.[22] H. Joe, Journal of Mathematical Analysis and Applica-

    tions 148, 287 (1990), ISSN 0022247X.[23] E. Ruch, R. Schranner, and T. H. Seligman, Journal of

    Mathematical Analysis and Applications 76, 222 (1980),ISSN 0022247X.

    [24] E. Ruch, R. Schranner, and T. H. Seligman, The Journalof Chemical Physics 69, 386 (1978), ISSN 00219606.

    [25] C. A. Mead, The Journal of Chemical Physics 66, 459(1977), ISSN 00219606.

    [26] R. Renner, Phd thesis, ETH Zurich (2005), arXiv:quant-ph/0512258.

    [27] M. Tomamichel, R. Colbeck, and R. Renner, IEEE Trans-actions on information theory 56, 4674 (2010).

    [28] M. Tomamichel, Ph.D. thesis, ETH Zurich (2012),arXiv:1203.2142.

    [29] R. Konig, R. Renner, and C. Schaffner, IEEE Transac-tions on Information Theory 55, 4337 (2009), ISSN 0018-9448.

    [30] N. Datta and R. Renner, IEEE Transactions on Informa-tion Theory 55, 2807 (2009), ISSN 0018-9448.

    [31] R. Bhatia, Matrix Analysis, Graduate Texts in Mathe-matics (Springer, 1997).

    [32] R. A. Horn and C. R. Johnson, Matrix Analysis (Cam-bridge University Press, New York, 1985).

    [33] A. Uhlmann, Reports on Mathematical Physics 1, 147(1970), ISSN 00344877.

    [34] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig, Math.-Naturwiss. 20, 633 (1971).

    [35] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig, Math.-Naturwiss. 21, 421 (1972).

    [36] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig, Math.-

  • 16

    Naturwiss. 22, 139 (1973).[37] U. Haagerup and M. Musat, Communications in Mathe-

    matical Physics 303, 555 (2011), ISSN 0010-3616.[38] A. Renyi, in Proceedings of the Fourth Berkeley Sym-

    posium on Mathematical Statistics and Probability, Vol.1: Contributions to the Theory of Statistics (1960), pp.547561.

    [39] M. Tomamichel, C. Schaffner, A. Smith, and R. Ren-ner, IEEE Transactions on Information Theory 57, 5524(2011), ISSN 0018-9448.

    [40] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information (Cambridge University Press,2000).

    [41] M. Tomamichel, R. Colbeck, and R. Renner, IEEE Trans-actions on information theory 55, 5840 (2009).

    [42] H. E. D. Scovil and E. O. Schulz-DuBois, Phys. Rev.Lett. 2, 262 (1959).

    [43] J. E. Geusic, E. O. Schulz-DuBios, and H. E. D. Scovil,Phys. Rev. 156, 343 (1967).

    [44] R. Alicki, Journal of Statistical Physics 20, 671 (1979),ISSN 0022-4715.

    [45] J. Howard, Nature 389, 561 (1997).[46] E. Geva and R. Kosloff, The Journal of chemical physics

    97, 4398 (1992).[47] P. Hanggi and F. Marchesoni, Rev. Mod. Phys. 81, 387

    (2009).[48] A. E. Allahverdyan and T. M. Nieuwenhuizen, Phys. Rev.

    Lett. 85, 1799 (2000).[49] T. Feldmann and R. Kosloff, Phys. Rev. E 73, 025107

    (2006).[50] J. Baugh, O. Moussa, C. Ryan, A. Nayak, and

    R. Laflamme, Nature 438, 470 (2005).[51] J. Gemmer, M. Michel, M. Michel, and G. Mahler, Quan-

    tum thermodynamics: Emergence of thermodynamic be-havior within composite quantum systems (Springer Ver-lag, 2009).

    [52] S. Popescu, A. Short, and A. Winter, Nature Physics 2,754 (2006).

    [53] N. Linden, S. Popescu, and P. Skrzypczyk, Physical Re-view Letters 105, 130401 (2010).

    [54] N. Linden, S. Popescu, A. Short, and A. Winter, NewJournal of Physics 12, 055021 (2010).

    [55] C. Gogolin, M. Muller, and J. Eisert, Physical ReviewLetters 106, 40401 (2011).

    [56] S. Trotzky, Y. Chen, A. Flesch, I. McCulloch,U. Schollwock, J. Eisert, and I. Bloch, Nature Physics8, 325 (2012).

    [57] A. Hutter and S. Wehner, ArXiv e-prints (2011),1111.3080.

    [58] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities:Theory of Majorization and Its Applications (Springer,2010), ISBN 0387400877.

    [59] G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities,Cambridge Mathematical Library (Cambridge UniversityPress, 1952), ISBN 9780521358804.

    [60] A. Barvinok, A Course in Convexity, vol. 54 of GraduateStudies in Mathematics (American Mathematical Soci-ety, 2002), ISBN 0-8218-2968-8.

    [61] J. Watrous, Theory of Quantum InformationLecturenotes from Fall 2008 (Online Lecture Notes, 2008), www.cs.uwaterloo.ca/~watrous/quant-info/.

    Formal Approach to Lambda-MajorizationPreliminaries and Main DefinitionFormulation of Lambda-Majorization in Terms of ChannelsProperties for quantum statesOptimal Lambda Majorization for Normalized States and Relation to Single-Shot Entropy Measures

    Derivation of the Main Result: Formulation as Semidefinite ProgramRnyi-zero entropy of the W stateReferences