Consciousness as a State of Matter - arXiv · Consciousness as a State of Matter Max Tegmark Dept....

Consciousness as a State of Matter

Max TegmarkDept. of Physics & MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA 02139

(Dated: January 8, 2014)

We examine the hypothesis that consciousness can be understood as a state of matter, “perceptro-nium”, with distinctive information processing abilities. We explore five basic principles that maydistinguish conscious matter from other physical systems such as solids, liquids and gases: the infor-mation, integration, independence, dynamics and utility principles. If such principles can identifyconscious entities, then they can help solve the quantum factorization problem: why do consciousobservers like us perceive the particular Hilbert space factorization corresponding to classical space(rather than Fourier space, say), and more generally, why do we perceive the world around us asa dynamic hierarchy of objects that are strongly integrated and relatively independent? Tensorfactorization of matrices is found to play a central role, and our technical results include a theoremabout Hamiltonian separability (defined using Hilbert-Schmidt superoperators) being maximized inthe energy eigenbasis. Our approach generalizes Giulio Tononi’s integrated information frameworkfor neural-network-based consciousness to arbitrary quantum systems, and we find interesting linksto error-correcting codes, condensed matter criticality, and the Quantum Darwinism program, aswell as an interesting connection between the emergence of consciousness and the emergence of time.

I. INTRODUCTION

What is the relation between the internal reality ofyour mind and the external reality described by the equa-tions of physics? The fact that no consensus answer tothis question has emerged in the physics community liesat the heart of many of the most hotly debated issuesin physics today. For example, how does quantum fieldtheory with weak-field gravity explain the appearance ofan approximately classical spacetime where experimentsappear to have definite outcomes? Out of all of the pos-sible factorizations of Hilbert space, why is the particularfactorization corresponding to classical space so special?Does the quantum wavefunction undergo a non-unitarycollapse when an observation is made, or are there Ev-erettian parallel universes? Does the non-observabilityof spacetime regions beyond horizons imply that theyin some sense do not exist independently of the regionsthat we can observe? If we understood consciousness as aphysical phenomenon, we could in principle answer all ofthese questions by studying the equations of physics: wecould identify all conscious entities in any physical sys-tem, and calculate what they would perceive. However,this approach is typically not pursued by physicists, withthe argument that we do not understand consciousnesswell enough.

In this paper, I argue that recent progress in neuro-science has fundamentally changed this situation, andthat we physicists can no longer blame neuroscientistsfor our own lack of progress. I have long contendedthat consciousness is the way information feels when be-ing processed in certain complex ways [1, 2], i.e., thatit corresponds to certain complex patterns in spacetimethat obey the same laws of physics as other complex sys-tems, with no “secret sauce” required. In the seminal pa-per “Consciousness as Integrated Information: a Provi-sional Manifesto” [3], Giulio Tononi made this idea morespecific and useful, making a compelling argument that

for an information processing system to be conscious, itneeds to have two separate traits:

1. Information: It has to have a large repertoire ofaccessible states, i.e., the ability to store a largeamount of information.

2. Integration: This information must be integratedinto a unified whole, i.e., it must be impossibleto decompose the system into nearly independentparts.

Tononi’s work has generated a flurry of activity in theneuroscience community, spanning the spectrum fromtheory to experiment (see [4, 5] for recent reviews), mak-ing it timely to investigate its implications for physics aswell. This is the goal of the present paper — a goal whosepursuit may ultimately provide additional tools for theneuroscience community as well.

A. Consciousness as a state of matter

Generations of physicists and chemists have studiedwhat happens when you group together vast numbers ofatoms, finding that their collective behavior depends onthe pattern in which they are arranged: the key differ-ence between a solid, a liquid and a gas lies not in thetypes of atoms, but in their arrangement. In this pa-per, I conjecture that consciousness can be understoodas yet another state of matter. Just as there are manytypes of liquids, there are many types of consciousness.However, this should not preclude us from identifying,quantifying, modeling and ultimately understanding thecharacteristic properties that all liquid forms of matter(or all conscious forms of matter) share.

To classify the traditionally studied states of matter,we need to measure only a small number of physical pa-

arX

iv:1

401.

1219

v1 [

quan

t-ph

] 6

Jan

201

4

2

ManyState of long-lived Information Easily Complex?matter states? integrated? writable? dynamics?Gas N N N YLiquid N N N YSolid Y N N NMemory Y N Y NComputer Y ? Y YConsciousness Y Y Y Y

TABLE I: Substances that store or process information canbe viewed as novel states of matter and investigated withtraditional physics tools.

rameters: viscosity, compressibility, electrical conductiv-ity and (optionally) diffusivity. We call a substance asolid if its viscosity is effectively infinite (producing struc-tural stiffness), and call it a fluid otherwise. We calla fluid a liquid if its compressibility and diffusivity aresmall and otherwise call it either a gas or a plasma, de-pending on its electrical conductivity.

What are the corresponding physical parameters thatcan help us identify conscious matter, and what are thekey physical features that characterize it? If such param-eters can be identified, understood and measured, thiswill help us identify (or at least rule out) consciousness“from the outside”, without access to subjective intro-spection. This could be important for reaching consen-sus on many currently controversial topics, ranging fromthe future of artificial intelligence to determining whenan animal, fetus or unresponsive patient can feel pain.If would also be important for fundamental theoreticalphysics, by allowing us to identify conscious observersin our universe by using the equations of physics andthereby answer thorny observation-related questions suchas those mentioned in the introductory paragraph.

B. Memory

As a first warmup step toward consciousness, let usfirst consider a state of matter that we would character-ize as memory — what physical features does it have?For a substance to be useful for storing information, itclearly needs to have a large repertoire of possible long-lived states or attractors (see Table I). Physically, thismeans that its potential energy function has a large num-ber of well-separated minima. The information storagecapacity (in bits) is simply the base-2 logarithm of thenumber of minima. This equals the entropy (in bits)of the degenerate ground state if all minima are equallydeep. For example, solids have many long-lived states,whereas liquids and gases do not: if you engrave some-one’s name on a gold ring, the information will still bethere years later, but if you engrave it in the surface of apond, it will be lost within a second as the water surfacechanges its shape. Another desirable trait of a memory

substance, distinguishing it from generic solids, is that itis not only easy to read from (as a gold ring), but alsoeasy to write to: altering the state of your hard drive oryour synapses requires less energy than engraving gold.

C. Computronium

As a second warmup step, what properties should weascribe to what Margolus and Toffoli have termed “com-putronium” [6], the most general substance that can pro-cess information as a computer? Rather than just re-main immobile as a gold ring, it must exhibit complexdynamics so that its future state depends in some com-plicated (and hopefully controllable/programmable) wayon the present state. Its atom arrangement must beless ordered than a rigid solid where nothing interest-ing changes, but more ordered than a liquid or gas. Atthe microscopic level, computronium need not be par-ticularly complicated, because computer scientists havelong known that as long as a device can perform certainelementary logic operations, it is universal: it can be pro-grammed to perform the same computation as any othercomputer with enough time and memory. Computervendors often parametrize computing power in FLOPS,floating-point operations per second for 64-bit numbers;more generically, we can parametrize computronium ca-pable of universal computation by “FLIPS”: the numberof elementary logical operations such as bit flips that itcan perform per second. It has been shown by Lloyd[7] that a system with average energy E can perform amaximum of 4E/h elementary logical operations per sec-ond, where h is Planck’s constant. The performance oftoday’s best computers is about 38 orders of magnitudelower than this, because they use huge numbers of parti-cles to store each bit and because most of their energy istied up in a computationally passive form, as rest mass.

D. Perceptronium

What about “perceptronium”, the most general sub-stance that feels subjectively self-aware? If Tononi isright, then it should not merely be able to store and pro-cess information like computronium does, but it shouldalso satisfy the principle that its information is inte-grated, forming a unified and indivisible whole.

Let us also conjecture another principle that conscioussystems must satisfy: that of autonomy, i.e., that in-formation can be processed with relative freedom fromexternal influence. Autonomy is thus the combinationof two separate properties: dynamics and independence.Here dynamics means time dependence (hence informa-tion processing capacity) and independence means thatthe dynamics is dominated by forces from within ratherthan outside the system. Just like integration, autonomyis postulated to be a necessary but not sufficient condi-tion for a system to be conscious: for example, clocks

3

Principle DefinitionInformation A conscious system has substantial

principle information storage capacity.Dynamics A conscious system has substantial

principle information processing capacity.Independence A conscious system has substantial

principle independence from the rest of the world.Integration A conscious system cannot consist of

principle nearly independent parts.Utility A conscious system records mainly

principle information that is useful for it.Autonomy A conscious system has substantial

principle dynamics and independence.

TABLE II: Conjectured necessary conditions for conscious-ness that we explore in this paper. The last one simply com-bines the second and third.

and diesel generators tend to exhibit high autonomy, butlack substantial information storage capacity.

E. Consciousness and the quantum factorizationproblem

Table II summarizes the candidate principles that wewill explore as necessary conditions for consciousness.Our goal with isolating and studying these principles isnot merely to strengthen our understanding of conscious-ness as a physical process, but also to identify simpletraits of conscious matter that can help us tackle otheropen problems in physics. For example, the only propertyof consciousness that Hugh Everett needed to assume forhis work on quantum measurement was that of the infor-mation principle: by applying the Schrodinger equationto systems that could record and store information, heinferred that they would perceive subjective randomnessin accordance with the Born rule. In this spirit, we mighthope that adding further simple requirements such as inthe integration principle, the independence principle andthe dynamics principle might suffice to solve currentlyopen problems related to observation.

In this paper, we will pay particular attention to whatI will refer to as the quantum factorization problem:why do conscious observers like us perceive the particu-lar Hilbert space factorization corresponding to classicalspace (rather than Fourier space, say), and more gener-ally, why do we perceive the world around us as a dy-namic hierarchy of objects that are strongly integratedand relatively independent? This fundamental problemhas received almost no attention in the literature [9]. Wewill see that this problem is very closely related to theone Tononi confronted for the brain, merely on a largerscale. Solving it would also help solve the “physics-from-scratch” problem [2]: If the Hamiltonian H and the totaldensity matrix ρ fully specify our physical world, howdo we extract 3D space and the rest of our semiclassicalworld from nothing more than two Hermitean matrices,

which come without any a priori physical interpretationor additional structure such as a physical space, quan-tum observables, quantum field definitions, an “outside”system, etc.? Can some of this information be extractedeven from H alone, which is fully specified by nothingmore than its eigenvalue spectrum? We will see that ageneric Hamiltonian cannot be decomposed using tensorproducts, which would correspond to a decomposition ofthe cosmos into non-interacting parts — instead, there isan optimal factorization of our universe into integratedand relatively independent parts. Based on Tononi’swork, we might expect that this factorization, or somegeneralization thereof, is what conscious observers per-ceive, because an integrated and relatively autonomousinformation complex is fundamentally what a consciousobserver is!

The rest of this paper is organized as follows. In Sec-tion II, we explore the integration principle by quanti-fying integrated information in physical systems, findingencouraging results for classical systems and interestingchallenges introduced by quantum mechanics. In Sec-tion III, we explore the independence principle, findingthat at least one additional principle is required to ac-count for the observed factorization of our physical worldinto an object hierarchy in three-dimensional space. InSection IV, we explore the dynamics principle and otherpossibilities for reconciling quantum-mechanical theorywith our observation of a semiclassical world. We discussour conclusions in Section V, including applications ofthe utility principle, and cover various mathematical de-tails in the three appendices. Throughout the paper, wemainly consider finite Hilbert spaces that can be viewedas collections of qubits; as explained in Appendix C, thisappears to cover standard quantum field theory with itsinfinite Hilbert space as well.

II. INTEGRATION

A. Our physical world as an object hierarchy

One of the most striking features of our physical worldis that we perceive it as an object hierarchy, as illustratedin Figure 1. If you are enjoying a cold drink, you per-ceive ice cubes in your glass as separate objects becausethey are both fairly integrated and fairly independent,e.g., their parts are more strongly connected to one an-other than to the outside. The same can be said abouteach of their constituents, ranging from water moleculesall the way down to electrons and quarks. Let us quan-tify this by defining the robustness of an object as theratio of the integration temperature (the energy per partneeded to separate them) to the independence tempera-ture (the energy per part needed to separate the parentobject in the hierarchy). Figure 1 illustrates that all ofthe ten types of objects shown have robustness of tenor more. A highly robust object preserves its identity

4

Object: Oxygen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

Object: Oxygen nucleusRobustness: 105

Independence T: 10 eVIntegration T: 1 MeV

Object: ProtonRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

Object: NeutronRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

Object: ElectronRobustness: 1022?Independence T: 10 eVIntegration T: 1016 GeV?

Object: Down quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

Object: Up quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

Object: Hydrogen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

Object: Ice cubeRobustness: 105

Independence T: 3 mKIntegration T: 300 K

Object: Water moleculeRobustness: 40Independence T: 300 KIntegration T: 1 eV

{mgh/kB~3mK permolecule

FIG. 1: We perceive the external world as a hierarchy of objects, whose parts are more strongly connected to one anotherthan to the outside. The robustness of an object is defined as the ratio of the integration temperature (the energy per partneeded to separate them) to the independence temperature (the energy per part needed to separate the parent object in thehierarchy).

(its integration and independence) over a wide range oftemperatures/energies/situations. The more robust anobject is, the more useful it is for us humans to perceiveit as an object and coin a name for it, as per the above-mentioned utility principle.

Returning to the “physics-from-scratch” problem, howcan we identify this object hierarchy if all we have to startwith are two Hermitean matrices, the density matrix ρencoding the state of our world and the Hamiltonian Hdetermining its time-evolution? Imagine that we knowonly these mathematical objects ρ and H and have noinformation whatsoever about how to interpret the var-ious degrees of freedom or anything else about them. Agood beginning is to study integration. Consider, for

example, ρ and H for a single deuterium atom, whoseHamiltonian is (ignoring spin interactions for simplicity)

H(rp,pp, rn,pn, re,pe) = (1)

= H1(rp,pp, rn,pn) + H2(pe) + H3(rp,pp, rn,pn, re,pe),

where r and p are position and momentum vectors, andthe subscripts p, n and e refer to the proton, the neutronand the electron. On the second line, we have decom-posed H into three terms: the internal energy of theproton-neutron nucleus, the internal (kinetic) energy ofthe electron, and the electromagnetic electron-nucleus in-teraction. This interaction is tiny, on average involving

5

much less energy than those within the nucleus:

tr H3ρ

tr H1ρ∼ 10−5, (2)

which we recognize as the inverse robustness for a typicalnucleus in Figure 3. We can therefore fruitfully approx-imate the nucleus and the electron as separate objectsthat are almost independent, interacting only weaklywith one another. The key point here is that we couldhave performed this object-finding exercise of dividingthe variables into two groups to find the greatest indepen-dence (analogous to what Tononi calls “the cruelest cut”)based on the functional form of H alone, without evenhaving heard of electrons or nuclei, thereby identifyingtheir degrees of freedom through a purely mathematicalexercise.

B. Integration and mutual information

If the interaction energy H3 were so small that wecould neglect it altogether, then H would be decompos-able into two parts H1 and H2, each one acting on onlyone of the two sub-systems (in our case the nucleus andthe electron). This means that any thermal state wouldbe factorizable:

ρ ∝ e−H/kT = e−H1/kT e−H2/kT = ρ1ρ2, (3)

so the total state ρ can be factored into a product ofthe subsystem states ρ1 and ρ2. In this case, the mutualinformation

I ≡ S(ρ1) + S(ρ2)− S(ρ) (4)

vanishes, where

S(ρ) ≡ −tr ρ log2 ρ (5)

is the Shannon entropy (in bits). Even for non-thermalstates, the time-evolution operator U becomes separable:

U ≡ eiHt/~ = eiH1t/~eiH2t/~ = U1U2, (6)

which (as we will discuss in detail in Section III) impliesthat the mutual information stays constant over time andno information is ever exchanged between the objects. Insummary, if a Hamiltonian can be decomposed withoutan interaction term (with H3 = 0), then it describes twoperfectly independent systems.

Let us now consider the opposite case, when a sys-tem cannot be decomposed into independent parts. Letus define the integrated information Φ as the mutual in-formation I for the “cruelest cut” (the cut minimizingI) in some class of cuts that subdivide the system intotwo (we will discuss many different classes of cuts be-low). Although our Φ-definition is slightly different from

Tononi’s [3]1, it is similar in spirit, and we are reusing hisΦ-symbol for its elegant symbolism (unifying the shapesof I for information and O for integration).

C. Maximizing integration

We just saw that if two systems are dynamically inde-pendent (H3 = 0), then Φ = 0 at all time both for ther-mal states and for states that were independent (Φ = 0)at some point in time. Let us now consider the oppo-site extreme. How large can the integrated informationΦ get? A as warmup example, let us consider the fa-miliar 2D Ising model in Figure 2 where n = 2500 mag-netic dipoles (or spins) that can point up or down areplaced on a square lattice, and H is such that they pre-fer aligning with their nearest neighbors. When T →∞,ρ ∝ e−H/kT → I, so all n states are equally likely, alln bits are statistically independent, and Φ = 0. WhenT → 0, all states freeze out except the two degenerateground states (all spin up or all spin down), so all spinsare perfectly correlated and Φ = 1 bit. For interme-diate temperatures, long-range correlations are seen toexist such that typical states have contiguous spin-up orspin-down patches. On average, we get about one bit ofmutual information for each such patch crossing our cut(since a spin on one side “knows” about at a spin on theother side), so for bipartitions that cut the system intotwo equally large halves, the mutual information will beproportional to the length of the cutting curve. The “cru-elest cut” is therefore a vertical or horizontal straight lineof length n1/2, giving Φ ∼ n1/2 at the temperature wheretypical patches are only a few pixels wide. We would sim-ilarly get a maximum integration Φ ∼ n1/3 for a 3D Isingsystem and Φ ∼ 1 bit for a 1D Ising system.

Since it is the spatial correlations that provide the in-tegration, it is interesting to speculate about whetherthe conscious subsystem of our brain is a system near itscritical temperature, close to a phase transition. Indeed,Damasio has argued that to be in homeostasis, a num-ber of physical parameters of our brain need to be keptwithin a narrow range of values [10] — this is preciselywhat is required of any condensed matter system to benear-critical, exhibiting correlations that are long-range(providing integration) but not so strong that the wholesystem becomes correlated like in the right panel or in abrain experiencing an epileptic seizure.

1 Tononi’s definition of Φ [3] applies only for classical systems,whereas we wish to study the quantum case as well. Our Φ ismeasured in bits and can grow with system size like an extrinsicvariable, whereas his is an intrinsic variable akin representing asort of average integration per bit.

6

Morecorrelation

Too little Too muchOptimum UniformRandom

Lesscorrelation

FIG. 2: The panels show simulations of the 2D Ising model on a 50× 50 lattice, with the temperature progressively decreasingfrom left to right. The integrated information Φ drops to zero bits at T → ∞ (leftmost panel) and to one bit as T → 0(rightmost panel), taking a maximum at an intermediate temperature near the phase transition temperature.

D. Integration, coding theory and error correction

Bits cut off

Inte

grat

ed in

form

atio

n Hamming (8,4)-code

(16 8-bit strings)

16 random 8-bit strings

128 8-bit stringswith checksum bit

2 4 6 8

1

2

3

4

FIG. 3: For various 8-bit systems, the integrated informationis plotted as a function of the number of bits cut off into asub-system with the “cruelest cut”. The Hamming (8,4)-codeis seen to give classically optimal integration except for a bi-partition into 4 + 4 bits: an arbitrary subset containing nomore than three bits is completely determined by the remain-ing bits. The code consisting of the half of all 8-bit stringswhose bit sum is even (i.e., each of the 128 7-bit strings fol-lowed by a parity checksum bit) has Hamming distance d = 2and gives Φ = 1 however many bits are cut off. A random setof 16 8-bit strings is seen to outperform the Hamming (8,4)-code for 4+4-bipartitions, but not when fewer bits are cut off.

Even when we tuned the temperature to the most fa-vorable value in our 2D Ising model example, the inte-grated information never exceeded Φ ∼ n1/2 bits, whichis merely a fraction n−1/2 of the n bits of informationthat n spins can potentially store. So can we do better?Fortunately, a closely related question has been carefullystudied in the branch of mathematics known as codingtheory, with the aim of optimizing error correcting codes.Consider, for example, the following set of m = 16 bit

strings, each written as a column vector of length n = 8:

M =

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 1 0 1 0 0 1 0 1 1 0 1 0 0 10 1 0 1 0 1 0 1 1 0 1 0 1 0 1 00 1 0 1 1 0 1 0 0 1 0 1 1 0 1 00 0 1 1 0 0 1 1 1 1 0 0 1 1 0 00 0 1 1 1 1 0 0 0 0 1 1 1 1 0 00 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1

This is known as the Hamming(8,4)-code, and has Ham-ming distance d = 4, which means that at least 4 bitflips are required to change one string into another [11].It is easy to see that for a code with Hamming distanced, any (d− 1) bits can always be reconstructed from theothers: You can always reconstruct b bits as long as eras-ing them does not make two bit strings identical, whichwould cause ambiguity about which the correct bit stringis. This implies that reconstruction works when the Ham-ming distance d > b.

To translate such codes of m bit strings of length ninto physical systems, we simply created a state spacewith n bits (interpretable as n spins or other two-statesystems) and construct a Hamiltonian which has an m-fold degenerate ground state, with one minimum cor-responding to each of the m bit strings in the code.In the low-temperature limit, all bit strings will re-ceive the same probability weight 1/m, giving an entropyS = log2m. The corresponding integrated informationΦ of the ground state is plotted in Figure 3 for a fewexamples, as a function of cut size k (the number of bitsassigned to the first subsystem). To calculate Φ for a cutsize k in practice, we simply minimize the mutual infor-mation I over all

(nk

)ways of partitioning the n bits into

k and (n− k) bits.We see that, as advertised, the Hamming(8,4)-code

gives gives Φ = 3 when 3 bits are cut off. However,it gives only Φ = 2 for bipartitions; the Φ-value for bi-

7

Bits cut off

Inte

grat

ed in

form

atio

n

256 r

andom

16-bi

t word

s

5 10 15

2

4

6

8

128 random 14-bit w

ords

64 12-bit words

32 10-bit words

16 8-bit words

8 6-bit words

4 4-bit words

FIG. 4: Same as for previous figure, but for random codeswith progressively longer bit strings, consisting of a randomsubset containing

√2n of the 2n possible bit strings. For

better legibility, the vertical axis has been re-centered for theshorter codes.

partitions is not simply related to the Hamming distance,and is not a quantity that most popular bit string codesare optimized for. Indeed, Figure 3 shows that for bipar-titions, it underperforms a code consisting of 16 randomunique bit strings of the same length. A rich and diverseset of codes have been published in the literature, andthe state-of-the-art in terms of maximal Hamming dis-tance for a given n is continually updated [12]. Althoughcodes with arbitrarily large Hamming distance d exist,there is (just as for our Hamming(8,4)-example above)no guarantee that Φ will be as large as d − 1 when thesmaller of the two subsystems contains more than d bits.Moreover, although Reed-Solomon codes are sometimesbilled as classically optimal erasure codes (maximizingd for a given n), their fundamental units are generallynot bits but groups of bits (generally numbers modulosome prime number), and the optimality is violated if wemake cuts that do not respect the boundaries of these bitgroups.

Although further research on codes maximizing Φwould be of interest, it is worth noting that simple ran-dom codes appear to give Φ-values within a couple of bitsof the theoretical maximum in the limit of large n, as il-lustrated in Figure 4. When cutting off k out of n bits,the mutual information in classical physics clearly can-not exceed the number of bits in either subsystem, i.e., kand n− k, so the Φ-curve for a code must lie within theshaded triangle in the figure. (The quantum-mechanicalcase is more complicated, and we well see in the next sec-tion that it in a sense integrates both better and worse.)The codes for which the integrated information is plottedsimply consist of a random subset containing 2n/2 of the2n possible bit strings, so roughly speaking, half the bitsencode fresh information and the other half provide theredundancy giving near-perfect integration.

Just as we saw for the Ising model example, these ran-dom codes show a tradeoff between entropy and redun-dancy, as illustrated in Figure 5. When there are n bits,how many of the 2n possible bit strings should we use

2-logarithm of number of patterns used

Inte

grat

ed in

form

atio

n (b

its)

2 4 6 8 10 12 14

1

2

3

4

5

6

7

FIG. 5: The integrated information is shown for randomcodes using progressively larger random subsets of the 214

possible strings of 14 bits. The optimal choice is seen to beusing about 27 bit strings, i.e., using about half the bits toencode information and the other half to integrate it.

to maximize the integrated information Φ? If we use mof them, we clearly have Φ ≤ log2m, since in classicalphysics, Φ cannot exceed the entropy if the system (themutual information is I = S1 + S2 − S, where S1 ≤ Sand S2 ≤ S so I ≤ S). Using very few bit strings istherefore a bad idea. On the other hand, if we use all2n of them, we lose all redundancy, the bits become in-dependent, and Φ = 0, so being greedy and using toomany bit strings in an attempt to store more informa-tion is also a bad idea. Figure 5 shows that the optimaltradeoff is to use

√2n of the codewords, i.e., to use half

the bits to encode information and the other half to in-tegrate it. Taken together, the last two figures thereforesuggest that n physical bits can be used to provide aboutn/2 bits of integrated information in the large-n limit.

E. Integration in physical systems

Let us explore the consequences of these results forphysical systems described by a Hamiltonian H and astate ρ. As emphasized by Hopfield [13], any physicalsystem with multiple attractors can be viewed as an in-formation storage device, since its state permanently en-codes information about which attractor it belongs to.Figure 6 shows two examples of H interpretable as po-tential energy functions for a a single particle in two di-mensions. They can both be used as information storagedevices, by placing the particle in a potential well andkeeping the system cool enough that the particle staysin the same well indefinitely. The egg crate potentialV (x, y) = sin2(πx) sin2(πy) (top) has 256 minima andhence a ground state entropy (information storage ca-pacity) S = 8 bits, whereas the lower potential has only16 minima and S = 4 bits.

The basins of attraction in the top panel are seen tobe the squares shown in the bottom panel. If we writethe x− and y− coordinates as binary numbers with bbits each, then the first 4 bits of x and y encode which

8

FIG. 6: A particle in the egg-crate potential energy land-scape (top panel) stably encodes 8 bits of information thatare completely independent of one another and therefore notintegrated. In contrast, a particle in a Hamming(8,4) poten-tial (bottom panel) encodes only 4 bits of information, butwith excellent integration. Qualitatively, a hard drive is morelike the top panel, while a neural network is more like thebottom panel.

square (x, y) is in. The information in the remainingbits encodes the location within this square; these bitsare not useful for information storage because they canvary over time, as the particle oscillates around a mini-mum. If the system is actively cooled, these oscillationsare gradually damped out and the particle settles towardthe attractor solution at the minimum, at the center of itsbasin. This example illustrates that cooling is a physicalexample of error correction: if thermal noise adds smallperturbations to the particle position, altering the leastsignificant bits, then cooling will remove these perturba-tions and push the particle back towards the minimumit came from. As long as cooling keeps the perturbationssmall enough that the particle never rolls out of its basinof attraction, all the 8 bits of information encoding itsbasin number are perfectly preserved. Instead of inter-preting our n = 8 data bits as positions in two dimen-sions, we can interpret them as positions in n dimensions,where each possible state corresponds to a corner of then-dimensional hypercube. This captures the essence ofmany computer memory devices, where each bit is storedin a system with two degenerate minima; the least sig-nificant and redundant bits that can be error-correctedvia cooling now get equally distributed among all the di-mensions.

How integrated is the information S? For the top panelof Figure 6, not at all: H can be factored as a tensorproduct of 8 two-state systems, so Φ = 0, just as fortypical computer memory. In other words, if the particleis in a particular egg crate basin, knowing any one of the

bits specifying the basin position tells us nothing aboutthe other bits. The potential in the lower panel, on theother hand, gives good integration. This potential retainsonly 16 of the 256 minima, corresponding to the 16 bitstrings of the Hamming(8,4)-code, which as we saw givesΦ = 3 for any 3 bits cut off and Φ = 2 bits for symmetricbipartitions. Since the Hamming distance d = 4 for thiscode, at least 4 bits must be flipped to reach anotherminimum, which among other things implies that no twobasins can share a row or column.

F. The pros and cons of integration

Natural selection suggests that self-reproducinginformation-processing systems will evolve integration ifit is useful to them, regardless of whether they are con-scious or not. Error correction can obviously be use-ful, both to correct errors caused by thermal noise andto provide redundancy that improves robustness towardfailure of individual physical components such as neu-rons. Indeed, such utility explains the preponderanceof error correction built into human-developed devices,from RAID-storage to bar codes to forward error cor-rection in telecommunications. If Tononi is correct andconsciousness requires integration, then this raises an in-teresting possibility: our human consciousness may haveevolved as an accidental by-product of error correction.There is also empirical evidence that integration is usefulfor problem-solving: artificial life simulations of vehiclesthat have to traverse mazes and whose brains evolve bynatural selection show that the more adapted they are totheir environment, the higher the integrated informationof the main complex in their brain [14].

However, integration comes at a cost, and as we willnow see, near maximal integration appears to be pro-hibitively expensive. Let us distinguish between the max-imum amount of information that can be stored in a statedefined by ρ and the maximum amount of informationthat can be stored in a physical system defined by H. Theformer is simply S(ρ) for the perfectly mixed (T = ∞)state, i.e., log2 of the number of possible states (the num-ber of bits characterizing the system). The latter canbe much larger, corresponding to log2 of the number ofHamiltonians that you could distinguish between givenyour time and energy available for experimentation. Letus consider potential energy functions whose k differentminima can be encoded as bit strings (as in Figure 6),and let us limit our experimentation to finding all theminima. Then H encodes not a single string of n bits,but a subset consisting of k out of all 2n such strings, one

for each minimum. There are(

2n

k

)such subsets, so the

information contained in H is

9

S(H) = log2

(2n

k

)= log2

2n!

k!(2n − k)!≈

≈ log2

(2n)k

kk= k(n− log2 k) (7)

for k � 2n, where we used Stirling’s approximationk! ≈ kk. So crudely speaking, H encodes not n bitsbut kn bits. For the near-maximal integration givenby the random codes from the previous section, we hadk = 2n/2, which gives S(H) ∼ 2n/2 n

2 bits. For example,

if the n ∼ 1011 neurons in your brain were maximallyintegrated in this way, then your neural network wouldrequire a dizzying 1010000000000 bits to describe, vastlymore information than can be encoded by all the 1089

particles in our universe combined.The neuronal mechanisms of human memory are still

unclear despite intensive experimental and theoretical ex-plorations [15], but there is significant evidence that thebrain uses attractor dynamics in its integration and mem-ory functions, where discrete attractors may be used torepresent discrete items [16]. The classic implementa-tion of such dynamics as a simple symmetric and asyn-chronous Hopfield neural network [13] can be conve-niently interpreted in terms of potential energy func-tions: the equations of the continuous Hopfield networkare identical to a set of mean-field equations that mini-mize a potential energy function, so this network alwaysconverges to a basin of attraction [17]. Such a Hopfieldnetwork gives a dramatically lower information contentS(H) of only about 0.25 bits for per synapse[17], andwe have only about 1014 synapses, suggesting that ourbrains can store only on the order of a few Terabytes ofinformation.

The integrated information of a Hopfield network iseven lower. For a Hopfield network of n neurons, thetotal number of attractors is bounded by 0.14n [17],so the maximum information capacity is merely S ≈log2 0.14n ≈ log2 n ≈ 37 bits for n = 1011 neurons. Evenin the most favorable case where these bits are maxi-mally integrated, our 1011 neurons thus provide a measlyΦ ≈ 37 bits of integrated information, as opposed toabout Φ ≈ 5× 1010 bits for a random coding.

G. The integration paradox

This leaves us with an integration paradox: why doesthe information content of our conscious experience ap-pear to be vastly larger than 37 bits? If Tononi’s informa-tion and integration principles from Section I are correct,the integration paradox forces us to draw at least one ofthe following three conclusions:

1. Our brains use some more clever scheme for encod-ing our conscious bits of information, which allowsdramatically larger Φ than Hopfield networks.

2. These conscious bits are much fewer than we mightnaively have thought from introspection, implyingthat we are only able to pay attention to a verymodest amount of information at any instant.

3. To be relevant for consciousness, the definition ofintegrated information that we have used must bemodified or supplemented by at least one additionalprinciple.

We will see that the quantum results in the next sectionbolster the case for conclusion 3.

The fundamental reason why a Hopfield network isspecified by much less information than a near-maximallyintegrated network is that it involves only pairwise cou-plings between neurons, thus requiring only ∼ n2 cou-pling parameters to be specified — as opposed to 2n pa-rameters giving the energy for each of the 2n possiblestates. It is striking how H is similarly simple for thestandard model of particle physics, with the energy in-volving only sums of pairwise interactions between parti-cles supplemented with occasional 3-way and 4-way cou-plings. H for the brain and H for fundamental physicsthus both appear to belong to an extremely simple sub-class of all Hamiltonians, that require an unusually smallamount of information to describe. Just as a system im-plementing near-maximal integration via random codingis too complicated to fit inside the brain, it is also toocomplicated to work in fundamental physics: Since theinformation storage capacity S of a physical system isapproximately bounded by its number of particles [7] orby its area in Planck units by the Holographic principle[8], it cannot be integrated by physical dynamics that it-self requires storage of the exponentially larger informa-tion quantity S(H) ∼ 2S/2 S

2 unless the Standard ModelHamiltonian is replaced by something dramatically morecomplicated.

An interesting theoretical direction for further research(pursuing resolution 1 to the integration paradox) istherefore to investigate what maximum amount of in-tegrated information Φ can be feasibly stored in a physi-cal system using codes that are algorithmic (such as RS-codes) rather than random. An interesting experimentaldirection would be to search for concrete implementa-tions of error-correction algorithms in the brain.

In summary, we have explored the integration prin-ciple by quantifying integrated information in physicalsystems. We have found that although excellent integra-tion is possible in principle, it is more difficult in prac-tice. In theory, random codes provide nearly maximalintegration, with about half of all n bits coding for dataand the other half providing Ψ ≈ n bits of integration),but in practice, the dynamics required for implement-ing them is too complex for our brain or our universe.Most of our exploration has focused on classical physics,where cuts into subsystems have corresponded to parti-tions of classical bits. As we will see in the next section,finding systems encoding large amounts of integrated in-formation is even more challenging when we turn to the

10

0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

Entropy S

Mut

ual i

nfor

mat

ion

I

Possible onlyquantum-mechanically

(entanglement)

Possibleclassically

Bell pair

( ).4.4.2.0( ).91

.03

.03

.03

( )1/31/31/30

( )1000

( ).7.1.1.1

( )1/200

1/2

( )1/21/200

( )1/41/41/41/4

( ).3.3.3.1

Quantum integrated

Unitary transform

ation

Unitary transformation

FIG. 7: Mutual information versus entropy for various 2-bitsystems. The different dots, squares and stars correspondto different states, which in the classical cases are definedby the probabilities for the four basis states 00, 01 10 and11. Classical states can lie only in the pyramid below theupper black star with (S, I) = (1, 1), whereas entanglementallows quantum states to extend all the way up to the upperblack square at (0, 2). However, the integrated information Φfor a quantum state cannot lie above the shaded green/greyregion, into which any other quantum state can be broughtby a unitary transformation. Along the upper boundary ofthis region, either three of the four probabilities are equal, orto two of them are equal while one vanishes.

quantum-mechanical case.

III. INDEPENDENCE

A. Classical versus quantum independence

How cruel is what Tononi calls “the cruelest cut”, di-viding a system into two parts that are maximally in-dependent? The situation is quite different in classicalphysics and quantum physics, as Figure 7 illustrates fora simple 2-bit system. In classical physics, the state isspecified by a 2×2 matrix giving the probabilities for thefour states 00, 01, 10 and 11, which define an entropy Sand mutual information I. Since there is only one pos-sible cut, the integrated information Φ = I. The pointdefined by the pair (S,Φ) can lie anywhere in the “pyra-mid” in the figure, who’s top at (S,Φ) = (1, 1) (blackstar) gives maximum integration, and corresponds to per-fect correlation between the two bits: 50% probability for00 and 11. Perfect anti-correlation gives the same point.The other two vertices of the classically allowed region

are seen to be (S,Φ) = (0, 0) (100% probability for a sin-gle outcome) and (S,Φ) = (2, 0) (equal probability for allfour outcomes).

In quantum mechanics, where the 2-qubit state is de-fined by a 4× 4 density matrix, the available area in the(S, I)-plane doubles to include the entire shaded trian-gle, with the classically unattainable region opened upbecause of entanglement. The extreme case is a Bell pairstate such as

|ψ〉 =1√2

(|↑〉|↑〉+ |↓〉|↓〉) , (8)

which gives (S, I) = (0, 2). However, whereas there wasonly one possible cut for 2 classical bits, there are now in-finitely many possible cuts because in quantum mechan-ics, all Hilbert space bases are equally valid, and we canchoose to perform the factorization in any of them. SinceΦ is defined as I after the cruelest cut, it is the I-valueminimized over all possible factorizations. For simplicity,we use the notation where ⊗ denotes factorization in thecoordinate basis, so the integrated information is

Φ = minU

I(UρU†), (9)

i.e., the mutual information minimized over all possi-ble unitary transformations U. Since the Bell pair ofequation (8) is a pure state ρ = |ψ〉〈ψ|, we can unitarilytransform it into a basis where the first basis vector is|ψ〉, making it factorizable:

U

12 0 0 1

20 0 0 00 0 0 012 0 0 1

2

U† =

1 0 0 00 0 0 00 0 0 00 0 0 0

=

(1 00 0

)⊗(

1 00 0

).

(10)This means that Φ = 0, so in quantum mechanics, thecruelest cut can be very cruel indeed: the most entangledstates possible in quantum mechanics have no integratedinformation at all!

The same cruel fate awaits the most integrated 2-bit state from classical physics: the perfectly correlatedmixed state ρ = 1

2 |↑〉〈↑| +12 |↓〉〈↓|. It gave Φ = 1 bit

classically above (upper black star in the figure), but aunitary transformation permuting its diagonal elementsmakes it factorable:

U

12 0 0 00 0 0 00 0 0 00 0 0 1

2

U† =

12 0 0 00 1

2 0 00 0 0 00 0 0 0

=

(1 00 0

)⊗(

12 00 1

2

),

(11)so Φ = 0 quantum-mechanically (lower black star in thefigure).

B. Canonical transformations, independence andrelativity

The fundamental reason that these states are more sep-arable quantum-mechanically is clearly that more cuts

11

are available, making the cruelest one crueler. Interest-ingly, the same thing can happen also in classical physics.Consider, for example, our example of the deuteriumatom from equation (1). When we restricted our cuts tosimply separating different degrees of freedom, we foundthat the group (rp,pp, rn,pn) was quite (but not com-pletely) independent of the group (re,pe), and that therewas no cut splitting things into perfectly independentpieces. In other words, the nucleus was fairly indepen-dent of the electron, but none of the three particles wascompletely independent of the other two. However, if weallow our degrees of freedom to be transformed beforethe cut, then things can be split into two perfectly in-dependent parts! The classical equivalent of a unitarytransformation is of course a canonical transformation(one that preserves phase-space volume). If we performthe canonical transformation where the new coordinatesare the center-of-mass position rM and the relative dis-placements r′p ≡ rp − rM and r′e ≡ re − rM , and cor-respondingly define pM as the total momentum of thewhole system, etc., then we find that (rM ,pM ) is com-pletely independent of the rest. In other words, the av-erage motion of the entire deuterium atom is completelydecoupled from the internal motions around its center-of-mass.

Thanks to relativity theory, this well-known decom-position into average and relative motions is of coursepossible for any isolated system. If two systems are com-pletely independent, then they can gain no knowledge ofeach other, so a conscious observer in one will be un-aware of the other. Conversely, we can view relativityas a special case of this idea: an observer in an isolatedsystem has no way of knowing whether she is at rest orin uniform motion, because these are simply two differ-ent allowed states for the center-of-mass system, whichis completely independent from the internal-motions sys-tem of which her consciousness is a part.

C. How integrated can quantum states be?

We saw in Figure 7 that some seemingly integratedstates, such as a Bell pair or a pair of classically per-fectly correlated bits, are in fact not integrated at all.But the figure also shows that some states are truly inte-grated even quantum-mechanically, with I > 0 even forthe cruelest cut. How integrated can a quantum state be?I have a conjecture which, if true, enables the answer tobe straightforwardly calculated.

ρ-Diagonality Conjecture (ρDC):The mutual information always takes its min-imum in a basis where ρ is diagonal

Although I do not have a proof of this conjecture, arigorous proof of a closely related conjecture will be pro-

FIG. 8: When performing random unitary transformationsof a density matrix ρ, the mutual information appears to al-ways be minimized when rotating into its eigenbasis, so thatρ becomes diagonal.

vided below for the special case of n = 2.2 Moreover,there is substantial numerical support for the conjecture.For example, Figure 8 plots the mutual information I for15,000 random unitary transformations of a single ran-dom 2×2 matrix, as a function of the off-diagonality, de-fined as the sum of the square modulus of the off-diagonalcomponents. It is seen that the lower envelope forms amonotonically increasing curve taking a minimum whenthe matrix is diagonal, i.e., rotated into its eigenbasis.Similar numerical experiments with a variety of n × ndensity matrices up to n = 25 showed the same qualita-tive behavior. In the rest of this section, we will assumethat the ρDC is in fact true and explore the consequences;if it is false, then Φ(ρ) will generally be even smaller thanfor the diagonal cases we explore.

Assuming that the ρDC is true, the first step in com-puting the integrated information Φ(ρ) is thus to diago-nalize the n×n density matrix ρ. If all n eigenvalues aredifferent, then there are n! possible ways of doing this,corresponding to the n! ways of permuting the eigenval-ues, so the ρDC simplifies the continuous minimizationproblem of equation (9) to a discrete minimization prob-lem over these n! permutations. Suppose that n = l×m,and that we wish to factor the m-dimensional Hilbertspace into factor spaces of dimensionality l and m, sothat Φ = 0. It is easy to see that this is possible if then eigenvalues of ρ can be arranged into a l ×m matrixthat is multiplicatively separable (rank 1), i.e., the prod-uct of a column vector and a row vector. Extracting theeigenvalues for our example from equation (11) where

2 The converse of the ρDC is straightforward to prove: if Φ = 0(which is equivalent to the state being factorizable; ρ = ρ1⊗ρ2),then it is factorizable also in its eigenbasis where both ρ1 and ρ2are diagonal.

12

l = m = 2 and n = 4, we see that(12

12

0 0

)is separable, but

(12 00 1

2

)is not,

and the only difference is that the order of the four num-bers has been permuted. More generally, we see that tofind the “cruelest cut” that defines the integrated infor-mation Φ, we want to find the permutation that makesthe matrix of eigenvalues as separable as possible. It iseasy to see that when seeking the permutation givingmaximum separability, we can without loss of generalityplace the largest eigenvalue first (in the upper left corner)and the smallest one last (in the lower right corner). Ifthere are only 4 eigenvalues (as in the above example),the ordering of the remaining two has no effect on I.

D. The quantum integration paradox

We now have the tools in hand to answer the key ques-tion from the last section: which state ρ maximizes theintegrated information Φ? Numerical search suggeststhat the most integrated state is a rescaled projectionmatrix satisfying ρ2 ∝ ρ. This means that some num-ber k of the n eigenvalues equal 1/k and the remain-ing ones vanish.3 For the n = 4 example from Fig-ure 7, k = 3 is seen to give the best integration, witheigenvalues (probabilities) 1/3, 1/3, 1/3 and 0, givingΦ = log(27/16)/ log(8) ≈ 0.2516.

For classical physics, we saw that the maximal at-tainable Φ grows roughly linearly with n. Quantum-mechanically, however, it decreases as n increases!4

In summary, no matter how large a quantum systemwe create, its state can never contain more than about aquarter of a bit of integrated information! This exacer-bates the integration paradox from Section II G, eliminat-ing both of the first two resolutions: you are clearly awareof more than 0.25 bits of information right now, and this

3 A heuristic way of understanding why having many equal eigen-values is advantageous is that it helps eliminate the effect of theeigenvalue permutations that we are minimizing over. If the op-timal state has two distinct eigenvalues, then if swapping themchanges I, it must by definition increase I by some finite amount.This suggests that we can increase the integration Φ by bringingthe eigenvalues infinitesimally closer or further apart, and repeat-ing this procedure lets us further increase Φ until all eigenvaluesare either zero or equal to some positive value.

4 One finds that Φ is maximized when the k nonzero eigenvaluesare arranged in a Young Tableau, which corresponds to a parti-tion of k as a sum of positive integers k1 + k2 + ..., giving Φ =S(p)+S(p∗)−log2 k, where the probability vectors p and p∗ aredefined by pi = ki/k and p∗i = k∗i /k. Here k∗i denotes the conju-gate partition. For example, if we cut an even number of qubitsinto two parts with n/2 qubits each, then n = 2, 4, 6, ..., 20 givesΦ ≈ 0.252, 0.171, 0.128, 0.085, 0.085, 0.073, 0.056, 0.056, 0.051and 0.042 bits, respectively. This is if the diagonality conjec-ture is true — if it is not, then Φ is even smaller.

quarter-bit maximum applies not merely to states of Hop-field networks, but to any quantum states of any system.Let us therefore begin exploring the third resolution: thatour definition of integrated information must be modifiedor supplemented by at least one additional principle.

E. How integrated is the Hamiltonian?

An obvious way to begin this exploration is to con-sider the state ρ not merely at a single fixed time t, butas a function of time. After all, it is widely assumed thatconsciousness is related to information processing, notmere information storage. Indeed, Tononi’s original Φ-definition [3] (which applies to classical neural networksrather than general quantum systems) involves time, de-pending on the extent to which current events affect fu-ture ones.

Because the time-evolution of the state ρ is determinedby the Hamiltonian H via the Schrodinger equation

ρ = i[H, ρ], (12)

whose solution is

ρ(t) = eiHtρe−iHt, (13)

we need to investigate the extent to which the cruelestcut can decompose not merely ρ but the pair (ρ,H) intoindependent parts. (Here and throughout, we often useunits where ~ = 1 for simplicity.)

F. Evolution with separable Hamiltonian

As we saw above, the key question for ρ is whether itit is factorizable (expressible as product ρ = ρ1 ⊗ ρ2 ofmatrices acting on the two subsystems), whereas the keyquestion for H is whether it is what we will call addi-tively separable, being a sum of matrices acting on thetwo subsystems, i.e., expressible in the form

H = H1 ⊗ I + I⊗H2 (14)

for some matrices H1 and H2. For brevity, we will oftenwrite simply separable instead of additively separable. Asmentioned in Section II B, a separable Hamiltonian Himplies that both the thermal state ρ ∝ e−H/kT andthe time-evolution operator U ≡ eiHt/~ are factorizable.An important property of density matrices which waspointed out already by von Neumann when he inventedthem [18] is that if H is separable, then

ρ1 = i[H1, ρ1], (15)

i.e., the time-evolution of the state of the first subsystem,ρ1 ≡ tr 2ρ, is independent of the other subsystem and ofany entanglement with it that may exist. This is easy to

13

prove: Using the identities (A16) and (A18) shows that

tr2

[H1 ⊗ I, ρ] = tr2

[(H1 ⊗ I)ρ]− tr2

[ρ(H1 ⊗ I)]

= H1 tr2

[(I⊗ I)ρ] + tr2

[ρ(I⊗ I)]H1

= H1ρ1 − ρ1H2 = [H1, ρ1]. (16)

Using the identity (A10) shows that

tr2

[I⊗H2, ρ] = 0. (17)

Summing equations (16) and (17) completes the proof.

G. The cruelest cut as the maximization ofseparability

Since a general Hamiltonian H cannot be written inthe separable form of equation (14), it will also include athird term H3 that is non-separable. The independenceprinciple from Section I therefore suggests an interest-ing mathematical approach to the physics-from-scratchproblem of analyzing the total Hamiltonian H for ourphysical world:

1. Find the Hilbert space factorization giving the“cruelest cut”, decomposing H into parts with thesmallest interaction Hamiltonian H3 possible.

2. Keep repeating this subdivision procedure for eachpart until only relatively integrated parts remainthat cannot be further decomposed with a smallinteraction Hamiltonian.

The hope would be that applying this procedure to theHamiltonian of our standard model would reproduce thefull observed object hierarchy from Figure 1, with the fac-torization corresponding to the objects, and the variousnon-separable terms H3 describing the interactions be-tween these objects. Any decomposition with H3 = 0would correspond to two parallel universes unable tocommunicate with one another.

We will now formulate this as a rigorous mathemat-ics problem, solve it, and derive the observational conse-quences. We will find that this approach fails catastroph-ically when confronted with observation, giving interest-ing hints regarding further physical principles needed forunderstanding why we perceive our world as an objecthierarchy.

H. The Hilbert-Schmidt vector space

To enable a rigorous formulation of our problem, letus first briefly review the Hilbert-Schmidt vector space, aconvenient inner-product space where the vectors are notwave functions |ψ〉 but matrices such as H and ρ. Forany two matrices A and B, the Hilbert-Schmidt innerproduct is defined by

(A,B) ≡ tr A†B. (18)

For example, the trace operator can be written as aninner product with the identity matrix:

tr A = (I,A). (19)

This inner product defines the Hilbert-Schmidt norm(also known as the Frobenius norm)

||A|| ≡ (A,A)12 = (tr A†A)

12 =

∑ij

|Aij |2 1

2

. (20)

If A is Hermitean (A† = A), then ||A||2 is simply thesum of the squares of its eigenvalues.

Real symmetric and antisymmetric matrices form or-thogonal subspaces under the Hilbert-Schmidt innerproduct, since (S,A) = 0 for any symmetric matrix S(satisfying St = S) and any antisymmetric matrix A(satisfying At = −A). Because a Hermitian matrix (sat-isfying H† = H) can be written in terms of real sym-metric and antisymmetric matrices as H = S + iA, wehave

(H1,H2) = (S1,S2) + (A1,A2),

which means that the inner product of two Hermiteanmatrices is purely real.

I. Separating H with orthogonal projectors

By viewing H as a vector in the Hilbert-Schmidt vec-tor space, we can rigorously define and decomposition ofit into orthogonal components, two of which are the sep-arable terms from equation (14). Given a factorization ofthe Hilbert space where the matrix H operates, we definefour linear superoperators5 Πi as follows:

Π0H ≡ 1

n(tr H) I (21)

Π1H ≡(

1

n2tr2

H

)⊗ I2 −Π0H (22)

Π2H ≡ I1 ⊗(

1

n1tr1

H

)−Π0H (23)

Π3H ≡ (I−Π1 −Π2 −Π3)H (24)

It is straightforward to show that these four linear op-erators Πi form a complete set of orthogonal projectors,i.e., that

3∑i=0

Πi = I, (25)

ΠiΠj = Πiδij , (26)

(ΠiH,ΠjH) = ||ΠiH||2δij . (27)

5 Operators on the Hilbert-Schmidt space are usually called su-peroperators in the literature, to avoid confusions with operatorson the underlying Hilbert space, which are mere vectors in theHilbert-Schmidt space.

14

This means that any Hermitean matrix H can be de-composed as a sum of four orthogonal components Hi ≡ΠiH, so that its squared Hilbert-Schmidt norm can bedecomposed as a sum of contributions from the four com-ponents:

H = H0 + H1 + H2 + H3, (28)

Hi ≡ ΠiH, (29)

(Hi,Hj) = ||Hi||2δij , (30)

||H||2 = ||H0||2+||H1||2+||H2||2+||H3||2. (31)

We see that H0 ∝ I picks out the trace of H, whereas theother three matrices are trace-free. This trace term is ofcourse physically uninteresting, since it can be eliminatedby simply adding an unobservable constant zero-point en-ergy to the Hamiltonian. H1 and H2 corresponds to thetwo separable terms in equation (14) (without the traceterm, which could have been arbitrarily assigned to ei-ther), and H3 corresponds to the non-separable residual.A Hermitean matrix H is therefore separable if and onlyif Π3H = 0. Just as it is customary to write the norm or avector r by r ≡ |r| (without bold face), we will denote theHilbert-Schmidt norm of a matrix H by H ≡ ||H||. Forexample, with this notation we can rewrite equation (31)as simply H2 = H2

0 +H21 +H2

2 +H23 .

Geometrically, we can think of n×n Hermitean matri-ces H as points in the N -dimensional vector space RN ,whereN = n×n (Hermiteal matrices have n real numberson the diagonal and n(n− 1)/2 complex numbers off thediagonal, constituting a total of n+ 2× n(n− 1)/2 = n2

real parameters). Diagonal matrices form a hyperplaneof dimension n in this space. The projection operatorsΠ0, Π1, Π2 and Π3 project onto hyperplanes of dimen-sion 1, (n − 1), (n − 1) and (n − 1)2, respectively, soseparable matrices form a hyperplane in this space of di-mension 2n− 1. For example, a general 4× 4 Hermiteanmatrix can be parametrized by 10 numbers (4 real for thediagonal part and 6 complex for the off-diagonal part),and its decomposition from equation (28) can be writtenas follows:

H =

t+a+b+v d+w c+x yd∗+w∗ t+a−b−v z c−xc∗+x∗ z∗ t−a+b−v d−wy∗ c∗−x∗ d∗−w∗ t−a−b+v

=

=

t 0 0 00 t 0 00 0 t 00 0 0 t

+

a 0 c 00 a 0 cc∗ 0 −a 00 c∗ 0 −a

+

b d 0 0d∗ −b 0 00 0 b d0 0 d∗ −b

+

+

v w x yw∗ −v z −xx∗ z∗ −v −wy∗ −x∗ −w∗ v

(32)

We see that t contributes to the trace (and H0) whilethe other three components Hi are traceless. We also seethat tr 1H2 = tr 2H1 = 0, and that both partial tracesvanish for H3.

J. Maximizing separability

We now have all the tools we need to rigorously max-imize separability and test the physics-from-scratch ap-proach described in Section III G. Given a HamiltonianH, we simply wish to minimize the norm of its non-separable component H3 over all possible Hilbert spacefactorizations, i.e., over all possible unitary transforma-tions. In other words, we wish to compute

E ≡ minU||Π3H||, (33)

where we have defined the integration energy E by anal-ogy with the integrated information Φ. If E = 0, thenthere is a basis where our system separates into two paral-lel universes, otherwise E quantifies the coupling betweenthe two parts of the system under the cruelest cut.

Tangent vectorδH=[A,H]

Integrationenergy

E=||Π3Η||

Non-separablecomponent

Π3Η

Sepa

rabl

e hy

perp

lane

: Π3Η

=0

Subsphere H=UH*U†{

FIG. 9: Geometrically, we can view the integration energyas the shortest distance (in Hilbert-Schmidt norm) betweenthe hyperplane of separable Hamiltonians and a subsphereof Hamiltonians that can be unitarily transformed into oneanother. The most separable Hamiltonian H on the subsphereis such that its non-separable component Π3 is orthogonalto all subsphere tangent vectors [A,H] generated by anti-Hermitean matrices A.

The Hilbert-Schmidt space allows us to interpret theminimization problem of equation (33) geometrically, asillustrated in Figure 9. Let H∗ denote the Hamiltonianin some given basis, and consider its orbit H = UHU†

under all unitary transformations U. This is a curved hy-persurface whose dimensionality is generically n(n − 1),i.e., n lower than that of the full space of Hermiteanmatrices, since unitary transformation leave all n eigen-

15

values invariant.6 We will refer to this curved hypersur-face as a subsphere, because it is a subset of the full n2-dimensional sphere: the radius H (the Hilbert-Schmidtnorm ||H||) is invariant under unitary transformations,but the subsphere may have a more complicated topologythan a hypersphere; for example, the 3-sphere is knownto topologically be the double cover of SO(3), the matrixgroup of 3× 3 orthonormal transformations.

We are interested in finding the most separable point Hon this subsphere, i.e., the point on the subsphere that isclosest to the (2n−1)-dimensional separable hyperplane.In our notation, this means that we want to find the pointH on the subsphere that minimizes ||Π3H||, the Hilbert-Schmidt norm of the non-separable component. If weperform infinitesimal displacements along the subsphere,||Π3H|| thus remains constant to first order (the gradientvanishes at the minimum), so all tangent vectors of thesubsphere are orthogonal to Π3H, the vector from theseparable hyperplane to the subsphere.

Unitary transformations are generated by anti-Hermitian matrices, so the most general tangent vectorδH is of the form

δH = [A,H] ≡ AH−HA (34)

for some anti-Hermitean n×n matrix A (any matrix sat-isfying A† = −A). We thus obtain the following simplecondition for maximal separability:

(Π3H, [A,H]) = 0 (35)

for any anti-Hermitean matrix A. Because the most gen-eral anti-Hermitean matrix can be written as A = iB fora Hermitean matrix B, equation (35) is equivalent to thecondition (Π3H, [B,H]) = 0 for all Hermitean matricesB. Since there are n2 anti-Hermitean matrices, equa-tion (35) is a system of n2 coupled quadratic equationsthat the components of H must obey.

K. The Hamiltonian diagonality conjecture

By analogy with our ρ-diagonality conjecture above,I once again conjecture that maximal separability is at-tained in the eigenbasis.

H-Diagonality Conjecture (HDC):The Hamiltonian is always maximally separa-ble (minimizing ||H3||) in the energy eigenba-sis where it is diagonal.

Numerical tests of this conjecture produce encouragingsupport that is visually quite analogous to Figure 8, but

6 n×n-dimensional Unitary matrices U are known to form an n×n-dimensional manifold: they can always be written as U = eiH

for some Hermitean matrix H, so they are parametrized by thesame number of real parameters (n× n) as Hermitean matrices.

with ||H3|| rather than I on the vertical axis. For theHDC, however, the mathematical formalism above alsopermits rigorous analytic tests. Let us consider the sim-plest case, when n = 4 so that H can be parametrizedas in equation (32). Since ||H3|| is invariant under uni-tary transformations acting only on either of the twofactor spaces, we can without loss of generality selecta basis where H1 and H2 are diagonal, so that the pa-rameters c = d = 0. The optimality condition of equa-tion (35) gives a separate constraint equation for each ofn2 linearly independent anti-Hermitean matrices A. Ifwe choose these matrices to be of the simplest possibleform, each with all except one or two elements vanishing,we find that the resulting n2 real-valued equations pro-duces merely 8 linearly independent ones which can becombined into the following four complex equations:

bw = 0, (36)

ax = 0, (37)

(a+ b)y = 0, (38)

(a− b)z = 0. (39)

The HDC states that H3 ≡ ||H3|| takes its minimumvalue when H is diagonal, i.e., when w = x = y = z = 0.To complete the proof of the HDC for n = 4, we thusneed to find all solutions to these equations and verifythat for each of them, H3 is greater than or equal to itsvalue in the energy eigenbasis. There are only 6 cases toconsider for a and b:

1. a = b = 0, solving the equations and givingH1 = H2 = 0. This clearly maximizes rather thanminimizes H3 (since H2

3 = H2−H20−H2

1−H22 , and

H0 is unitarily invariant), so H3 cannot be smallerthan in the eigenbasis.

2. a = 0, b 6= 0, so w = y = z = 0, giving H3 =321/2(|v|2 + |x|2)1/2. This is not the H3-minimum,because H is separable (H3 = 0) in its eigenbasis.

3. b = 0, a 6= 0, so x = y = z = 0, analogously givingH3 = 321/2(|v|2 + |y|2)1/2 ≥ 0, the value in theeigenbasis.

4. a = b 6= 0, so w = x = y = 0, giving H3 = (32|v|2 +16|z|2)1/2 ≥ 321/2|v|2, the value in the eigenbasis.

5. a = −b 6= 0, so w = x = z = 0, giving H3 =(32|v|2 + 16|y|2)1/2 ≥ 321/2|v|, the value in theeigenbasis.

6. a 6= 0, b 6= 0, |a| 6= |b|, so w = x = y = z = 0,giving a diagonal H.

In summary, when applying all unitary transformations,the non-separable part H3 takes its minimum when His diagonal, which completes the proof. It also takes itsmaximum when H1 = H2 = 0, and has saddle pointswhen H1 or H2 vanish or when H1 = H2. Any extremumwhere H1 and H2 are generic (non-zero and non-equal)has H diagonal.

16

Although it appears likely that the HDC can be analo-gously proven for any given n by using equation (35) andexamining all solutions to the resulting system of cou-pled quadratic equations, a general proof for arbitrary nwould obviously be more interesting. Any diagonal n×nHamiltonian H is a solution to equation (35), so whatremains to be established is that this is the global mini-mum.7

If the HDC is correct, then separability is always max-imized in the energy eigenbasis, where the n× n matrixH is diagonal and the projection operators Πi definedby equations (21)-(24) greatly simplify. If we arrangethe n = lm diagonal elements of H into an l×m matrixH, then the action of the linear operators Πi is given bysimple matrix operations:

H0 ≡ QlHQm, (40)

H1 ≡ PlHQm, (41)

H2 ≡ QlHPm, (42)

H3 ≡ PlHPm, (43)

where

Pm ≡ I −Qm, (44)

(Qm)ij ≡1

m(45)

are m × m projection matrices satisfying P 2m = Pm,

Q2m = Qm, PmQm = QmPm = 0, Pm + Qm = I. (To

avoid confusion, we are using boldface for n × n matri-ces and plain font for smaller matrices involving only theeigenvalues.)

L. Ultimate independence and the Quantum Zenoparadox

In Section III G, we began exploring the idea that ifwe divide the world into maximally independent parts(with minimal interaction Hamiltonians), then the ob-served object hierarchy from Figure 1 would emerge. As-suming that the HDC is correct, we have now found thatthis decomposition (factorization) into maximally inde-pendent parts can be performed in the energy eigenba-sis of the total Hamiltonian. This means that all sub-system Hamiltonians and all interaction Hamiltonianscommute with one another, corresponding to an essen-tially classical world where none of the quantum effects

7 In the large-n limit, it is rather intuitive that the HDC should beat least approximately true, because almost all the n(n−1) ≈ n2

off-diagonal degrees of freedom for H belong to H3 which is beingminimized. Since we can restrict ourselves to bases where H1 andH2 are diagonal as above, there are only of order n off-diagonaldegrees of freedom not contributing to H3 (contributing insteadto H1 and H2). If we begin in a basis where H is diagonal andapply a unitary transformation adding off-diagonal elements, H3

thus generically increases.

FIG. 10: If the Hamiltonian of a system commutes with theinteraction Hamiltonian ([H1,H3] = 0), then decoherencedrives the system toward a time-independent state ρ wherenothing ever changes. The figure illustrates this for the BlochSphere of a single qubit starting in a pure state and ending upin a fully mixed state ρ = I/2. More general initial states endup somewhere along the z-axis. Here H1 ∝ σz, generating asimple precession around the z-axis.

associated with non-commutativity manifest themselves!In contrast, many systems that we customarily refer toas objects in our classical world do not commute withtheir interaction Hamiltonians: for example, the Hamil-tonian governing the dynamics of a baseball involves itsmomentum, which does not commute with the position-dependent potential energy due to external forces.

As emphasized by Zurek [19], states commuting withthe interaction Hamiltonian form a “pointer basis” ofclassically observable states, playing an important rolein understanding the emergence of a classical world.The fact that the independence principle automaticallyleads to commutativity with interaction Hamiltoniansmight therefore be taken as an encouraging indicationthat we are on the right track. However, whereasthe pointer states in Zurek’s examples evolve over timedue to the system’s own Hamiltonian H1, those in ourindependence-maximizing decomposition do not, becausethey commute also with H1. Indeed, the situationis even worse, as illustrated in Figure 10: any time-dependent system will evolve into a time-independentone, as environment-induced decoherence [20–23, 25, 27]drives it towards an eigenstate of the interaction Hamil-tonian, i.e., an energy eigenstate.8

8 For a system with a finite environment, the entropy will eventu-

17

The famous Quantum Zeno effect, whereby a systemcan cease to evolve in the limit where it is arbitrar-ily strongly coupled to its environment [28], thus has astronger and more pernicious cousin, which we will termthe Quantum Zeno Paradox or the Independence Para-dox.

Quantum Zeno Paradox:If we decompose our universe into maximallyindependent objects, then all change grinds toa halt.

In summary, we have tried to understand the emer-gence of our observed semiclassical world, with its hi-erarchy of moving objects, by decomposing the worldinto maximally independent parts, but our attempts havefailed dismally, producing merely a timeless world remi-niscent of heat death. In Section II G, we saw that usingthe integration principle alone led to a similarly embar-rassing failure, with no more than a quarter of a bit ofintegrated information possible. At least one more prin-ciple is therefore needed.

IV. DYNAMICS AND AUTONOMY

Let us now explore the implications of the dynamicsprinciple from Table II, according to which a conscioussystem has the capacity to not only store information,but also to process it. As we just saw above, there is aninteresting tension between this principle and the inde-pendence principle, whose Quantum Zeno Paradox givesthe exact opposite: no dynamics and no information pro-cessing at all.

We will term the synthesis of these two competing prin-ciples the autonomy principle: a conscious system hassubstantial dynamics and independence. When explor-ing autonomous systems below, we can no longer studythe state ρ and the Hamiltonian H separately, since theirinterplay is crucial. Indeed, we well see that there are in-teresting classes of states ρ that provide substantial dy-namics and near-perfect independence even when the in-teraction Hamiltonian H3 is not small. In other words,for certain preferred classes of states, the independenceprinciple no longer pushes us to simply minimize H3 andface the Quantum Zeno Paradox.

A. Probability velocity and energy coherence

To obtain a quantitative measure of dynamics, let usfirst define the probability velocity v ≡ p, where the prob-

ally decrease again, causing the resumption of time-dependence,but this Poincare recurrence time grows exponentially with envi-ronment size and is normally large enough that decoherence canbe approximated as permanent.

ability vector p is given by pi ≡ ρii. In other words,

vk = ρkk = i[H, ρ]kk. (46)

Since v is basis-dependent, we are interested in findingthe basis where

v2 ≡∑k

v2k =∑k

(ρkk)2 (47)

is maximized, i.e., the basis where the sums of squares ofthe diagonal elements of ρ is maximal. It is easy to seethat this basis is the eigenbasis of ρ:

v2 =∑k

(ρkk)2 =∑jk

(ρjk)2 −∑j 6=k

(ρjk)2

= ||ρ||2 −∑j 6=k

(ρjk)2 (48)

is clearly maximized in the eigenbasis where all off-diagonal elements in the last term vanish, since theHilbert-Schmidt norm ||ρ|| is the same in every basis;||ρ||2 = tr ρ2, which is simply the sum of the squares ofthe eigenvalues of ρ.

Let us define the energy coherence

δH ≡ 1√2||ρ|| = 1√

2||[H, ρ]|| =

√−tr {[H, ρ]2}

2

=√

tr [H2ρ2 −HρHρ]. (49)

For a pure state ρ = |ψ〉〈ψ|, this definition implies thatδH ≡ ∆H, where ∆H is the energy uncertainty

∆H =[〈ψ|H2|ψ〉 − 〈ψ|H|ψ〉2

]1/2, (50)

so we can think of δH as the coherent part of the en-ergy uncertainty, i.e., as the part that is due to quantumrather than classical uncertainty.

Since ||ρ|| = ||[H, ρ]|| =√

2δH, we see that the maxi-mum possible probability velocity v is simply

vmax =√

2 δH, (51)

so we can equivalently use either of v or δH as convenientmeasures of quantum dynamics.9 Whimsically speaking,the dynamics principle thus implies that energy eigen-states are as unconscious as things come, and that if youknow your own energy exactly, you’re dead.

9 The fidelity between the state ψ(t) and the initial state ψ0 isdefined as

F (t) ≡ 〈ψ0|ψ(t)〉, (52)

and it is easy to show that F (0) = 0 and F (0) = −(∆H)2, sothe energy uncertainty is a good measure of dynamics in that italso determines the fidelity evolution to lowest order, for purestates. For a detailed review of related measures of dynam-ics/information processing capacity, see [7].

18

FIG. 11: Time-evolution of Bloch vector trσρ1 for a single qubit subsystem. We saw how minimizing H3 leads to a static statewith no dynamics, such as the left example. Maximizing δH, on the other hand, produces extremely simple dynamics such asthe right example. Reducing δH by a modest factor of order unity can allow complex and chaotic dynamics (center); shownhere is a 2-qubit system whether the second qubit is traced out.

Although it is not obvious from their definitions, thesequantities vmax and δH are independent of time (eventhough ρ generally evolves). This is easily seen in theenergy eigenbasis, where

− iρmn = [H, ρ]mn = ρmn(Em − En), (53)

where the energies En are the eigenvalues of H. In thisbasis, ρ(t) = eiHtρ(0)e−iHt simplifies to

ρ(t)mn = ρ(0)mnei(Em−En)t, (54)

This means that in the energy eigenbasis, the probabili-ties pn ≡ ρnn are invariant over time. These quantitiesconstitute the energy spectral density for the state:

pn = 〈En|ρ|En〉. (55)

In the energy eigenbasis, equation (50) reduces to

δH2 = ∆H2 =∑n

pnE2n −

(∑n

pnEn

)2

, (56)

which is time-invariant because the spectral density pnis. For general states, equation (49) simplifies to

δH2 =∑mn

|ρmn|2En(En − Em). (57)

This is time-independent because equation (54) showsthat ρmn changes merely by a phase factor, leaving |ρmn|invariant. In other words, when a quantum state evolvesunitarily in the Hilbert-Schmidt vector space, both theposition vector ρ and the velocity vector ρ retain theirlengths: both ||ρ|| and ||ρ|| remain invariant over time.

B. Dynamics versus complexity

Our results above show that if all we are interested inis maximizing the maximal probability velocity, then weshould find the two most widely separated eigenvalues ofH, Emin and Emax, and choose a pure state that involvesa coherent superposition of the two:

|ψ〉 = c1|Emin〉+ c2|Emax〉, (58)

where |c1| = |c2| = 1/√

2. This gives δH = (Emax −Emin)/2, the largest possible value, but produces an ex-tremely simple and boring solution ρ(t). Since the spec-tral density pn = 0 except for these two energies, thedynamics is effectively that of a 2-state system (a sin-gle qubit) no matter how large the dimensionality of His, corresponding to a simple periodic solution with fre-quency ω = Emax − Emin (a circular trajectory in theBloch sphere as in the right panel of Figure 11). This vi-olates the dynamics principle as defined in Table II, sinceno substantial information processing capacity exists: thesystem is simply performing the trivial computation thatflips a single bit repeatedly.

To perform interesting computations, the systemclearly needs to exploit a significant part of its energyspectrum. As can be seen from equation (54), if theeigenvalue differences are irrational multiples of one an-other, then the time evolution will never repeat, and ρwill eventually evolve through all parts of Hilbert spaceallowed by the invariants |〈Em|ρ|En〉|. The reduction ofδH required to transition from simple periodic motion

19

Dynamics from H1

slides along diagonal

ρij≠0

High-decoherence subspace

High-decoherence subspace

Dynamics from H3 suppresses

off-diagonal elements ρij

Dynamics from H3 suppresses

off-diagonal elements ρij

Low-decoherence subspace

i

j

FIG. 12: Schematic representation of the time-evolution ofthe density matrix ρij for a highly autonomous subsystem.ρij ≈ 0 except for a single region around the diagonal(red/grey dot), and this region slides along the diagonal un-der the influence of the subsystem Hamiltonian H1. Any ρij-elements far from the diagonal rapidly approach zero becauseof environment-decoherence caused by the interaction Hamil-tonian H3.

to such complex aperiodic motion is quite modest. Forexample, if the eigenvalues are roughly equispaced, thenchanging the spectral density pn from having all weight atthe two endpoints to having approximately equal weightfor all eigenvalues will only reduce the energy coherenceδH by about a factor

√3, since the standard deviation

of a uniform distribution is√

3 times smaller than itshalf-width.

C. Highly autonomous systems: sliding along thediagonal

What combinations of H, ρ and factorization producehighly autonomous systems? A broad and interestingclass corresponds to macroscopic objects around us thatmove classically to an excellent approximation.

The states that are most robust toward environment-induced decoherence are those that approximately com-mute with the interaction Hamiltonian [22]. As a simplebut important example, let us consider an interactionHamiltonian of the factorizable form

H3 = A⊗B, (59)

and work in a basis where the interaction term A is di-agonal. If ρ1 is approximately diagonal in this basis,

then H3 has little effect on the dynamics, which be-comes dominated by the internal subsystem HamiltonianH1. The Quantum Zeno Paradox we encountered in Sec-tion III L involved a situation where H1 was also diag-onal in this same basis, so that we ended up with nodynamics. As we will illustrate with examples below,classically moving objects in a sense constitute the op-posite limit: the commutator ρ1 = i[H1, ρ1] is essentiallyas large as possible instead of as small as possible, con-tinually evading decoherence by concentrating ρ arounda single point that continually slides along the diagonal,as illustrated in Figure 12. Decohererence rapidly sup-presses off-diagonal elements far from this diagonal, butleaves the diagonal elements completely unaffected, sothere exists a low-decoherence band around the diagonal.Suppose, for instance, that our subsystem is the center-of-mass position x of a macroscopic object experiencinga position-dependent potential V (x) caused by couplingto the environment, so that Figure 12 represents the den-sity matrix ρ1(x, x′) in the position basis. If the potentialV (x) has a flat (V ′ = 0) bottom of width L, then ρ1(x, x′)will be completely unaffected by decoherence for the band|x′−x| < L. For a generic smooth potential V , the deco-herence suppression of off-diagonal elements grows onlyquadratically with the distance |x′−x| from the diagonal[21, 26], again making decoherence much slower than theinternal dynamics in a narrow diagonal band.

As a specific example of this highly autonomous type,let us consider a subsystem with a uniformly spaced en-ergy spectrum. Specifically, consider an n-dimensionalHilbert space and a Hamiltonian with spectrum

Ek =

[k − n− 1

2

]~ω = k~ω + E0, (60)

k = 0, 1, ..., n − 1. We will often set ~ω = 1 for simplic-ity. For example, n = 2 gives the spectrum {− 1

2 ,12}

like the Pauli matrices divided by two, n = 5 gives{−2,−1, 0, 1, 2} and n → ∞ gives the simple Harmonicoscillator (since the zero-point energy is physically irrele-vant, we have chosen it so that tr H =

∑Ek = 0, whereas

the customary choice for the harmonic oscillator is suchthat the ground state energy is E0 = ~ω/2).

If we want to, we can define the familiar position andmomentum operators x and p, and interpret this systemas a Harmonic oscillator. However, the probability ve-locity v is not maximized in either the position or themomentum basis, except twice per oscillation — whenthe oscillator has only kinetic energy, v is maximized inthe x-basis, and when it has only potential energy, v ismaximized in the p-basis, and when it has only poten-tial energy. If we consider the Wigner function W (x, p),which simply rotates uniformly with frequency ω, it be-comes clear that the observable which is always chang-ing with the maximal probability velocity is instead thephase, the Fourier-dual of the energy. Let us thereforedefine the phase operator

Φ ≡ FHF†, (61)

20

where F is the unitary Fourier matrix.

Please remember that none of the systems H that weconsider have any a priori physical interpretation; rather,the ultimate goal of the physics-from-scratch program isto derive any interpretation from the mathematics alone.Generally, any thus emergent interpretation of a subsys-tem will depend on its interactions with other systems.Since we have not yet introduced any interactions for oursubsystem, we are free to interpret it in whichever way isconvenient. In this spirit, an equivalent and sometimesmore convenient way to interpret our Hamiltonian fromequation (60) is as a massless one-dimensional scalar par-ticle, for which the momentum equals the energy, so themomentum operator is p = H. If we interpret the par-ticle as existing in a discrete space with n points and atoroidal topology (which we can think of as n equispacedpoints on a ring), then the position operator is related tothe momentum operator by a discrete Fourier transform:

x = FpF†, Fjk ≡1√Nei

jk2πn . (62)

Comparing equations (61) and (62), we see that x =Φ. Since F is unitary, the operators H, p, x and Φall have the same spectrum: the evenly spaced grid ofequation (60).

As illustrated in Figure 13, the time-evolution gener-ated by H has a very simple geometric interpretationin the space spanned by the position eigenstates |xk〉,k = 1, ...n: the space is simply rotating with frequency ωaround a vector that is the sum of all the position eigen-vectors, so after a time t = 2π/nω, a state |ψ(0)〉 = |xk〉has been rotated such that it equals the next eigenvector:|ψ(t)〉 = |xk+1〉, where the addition is modulo n. Thismeans that the system has period T ≡ 2π/ω, and that|ψ〉 rotates through each of the n basis vectors duringeach period.

Let us now quantify the autonomy of this system, start-ing with the dynamics. Since a position eigenstate is aDirac delta function in position space, it is a plane wavein momentum space — and in energy space, since H = p.This means that the spectral density is pn = 1/n for aposition eigenstate. Substituting equation (60) into equa-tion (56) gives an energy coherence

δH = ~ω√n2 − 1

12. (63)

For comparison,

||H|| =

(n−1∑k=0

E2k

)1/2

= ~ω√n(n2 − 1)

12=√n δH. (64)

Let us now turn to quantifying independence and de-coherence. The inner product between the unit vector|ψ(0)〉 and the vector |ψ(t)〉 ≡ eiHt|ψ(0)〉 into which it

evolves after a time t is

fn(φ) ≡ 〈ψ|eiHφω |ψ〉 =

1

n

n−1∑k=0

eiEkφ = e−in−12 φ

n−1∑k=0

eikφ

=1

ne−i

n−12 φ 1− einφ

1− eiφ=

sinnφ

n sinφ, (65)

where φ ≡ ωt. This inner product fn is plotted in Fig-ure 14, and is seen to be a sharply peaked even functionsatisfying fn(0) = 1, fn(2πk/n) = 0 for k = 1, ..., n − 1and exhibiting one small oscillation between each of thesezeros. The angle θ ≡ cos−1 fn(φ) between an initial vec-tor φ and its time evolution thus grows rapidly from 0◦ to90◦, then oscillates close to 90◦ until returning to 0◦ aftera full period T . An initial state |ψ(0)〉 = |xk〉 thereforeevolves as

ψj(t) = fn(ωt− 2π[j − k]/n)

in the position basis, i.e., a wavefunction ψj sharplypeaked for j ∼ k + nωt/2π (mod n). Since the densitymatrix evolves as ρij(t) = ψi(t)ψj(t)

∗, it will thereforebe small except for i ∼ j ∼ k + nωt/2π (mod n), corre-sponding to the round dot on the diagonal in Figure 12.In particular, the decoherence-sensitive elements ρjk willbe small far from the diagonal, corresponding to the smallvalues that fn takes far from zero. How small will thedecoherence be? Let us now develop the tools needed toquantify this.

D. The exponential growth of autonomy withsystem size

Let us return to the most general Hamiltonian Hand study how an initially separable state ρ = ρ1 ⊗ ρ2evolves over time. Using the orthogonal projectors ofSection III I, we can decompose H as

H = H1 ⊗ I + I⊗H2 + H3, (66)

where tr 1H3 = tr 2H3 = 0. By substituting equa-tion (66) into the evolution equation ρ1 = tr 2ρ =itr 2[H, ρ] and using various partial trace identities fromSection A to simplify the resulting three terms, we obtain

ρ1 = i tr2

[H, ρ1 ⊗ ρ2] = i [H1 + H∗, ρ1], (67)

where what we will term the effective interaction Hamil-tonian

H∗ ≡ tr2{(I⊗ ρ2)H3} (68)

can be interpreted as an average of the interaction Hamil-tonian H3, weighted by the environment state ρ2. Equa-tion (67) implies that the evolution of ρ1 remains unitaryto first order in time, the only effect of the interaction H3

being to replace H1 from equation (15) by an effectiveHamiltonian H1 + H∗.

21

x

zy

z

ω

x

y

z

ˆ

ˆ ˆ

ω

2x

1x

5x

8x7x

4x

6x

3x

ω

FIG. 13: For a system with an equispaced energy spectrum (such as a truncated harmonic oscillator or a massless particle ina discrete 1-dimensional periodic space), the time-evolution has a simple geometric interpretation in the space spanned by theeigenvectors xk of the phase operator FHF, the Fourier dual of the Hamiltonian, simply corresponding to rotating the entirespace with frequency ω around the vector sum of these basis vectors. Here ~ω is the energy level spacing. By plotting things ina plane perpendicular to this basis vector sum, we can easily visualize also situations with n > 3 dimensions: the basis vectorsform a regular n-sided polygon, and time-evolution simply rotates this plane around the polygon center so that each basisvector gets mapped into the subsequent one after a time 2π/nω. The black star denotes the α = 1 apodized state described inthe text, which is more robust toward decoherence.

1.0

-0.2

0.8

0.4

0.2

0.6

-100°-150° 150°100°50°-50°

Pena

lty fu

nctio

n

Apo

dize

d

Non-apodized

FIG. 14: The wiggliest (heavy black) curve shows the innerproduct of a position eigenstate with what it evolves into atime t = φ/ω later due to our n = 20-dimensional Hamilto-nian with energy spacings ~ω. When optimizing to minimizethe square of this curve using the 1 − cosφ penalty func-tion shown, corresponding to apodization in the Fourier do-main, we instead obtain the green/light grey curve, resultingin much less decoherence.

The second time derivative is given by ρ1 = tr 2ρ =−tr 2[H, [H, ρ]], and by analogously substituting equa-tion (66) and using partial trace identities from Section Ato simplify the resulting nine terms, we obtain

− ρ1 = tr 2[H, [H, ρ1 ⊗ ρ2]] =

= [H1, [H1, ρ1]]− i [K, ρ1] +

+ [H1, [H∗, ρ1]] + [H∗, [H1, ρ1]] +

+ tr 2[H3, [H3, ρ1 ⊗ ρ2]], (69)

where we have defined the Hermitean matrix

K ≡ i tr 2{(I⊗ [H2, ρ2])H3}. (70)

To qualify independence and autonomy, we are inter-ested in the extent to which H3 causes entanglement andmakes the time-evolution of ρ1 non-unitary. When think-ing of ρ as a vector in the Hilbert-Schmidt vector spacethat we reviewed in Section III H, unitary evolution pre-serves its length ||ρ||. To provide geometric intuition forthis, let us define dot and cross product notation analo-gous to vector calculus. First note that

(A†, [A,B]) = tr AAB− tr ABA = 0, (71)

since a trace of a product is invariant under cyclic per-mutations of the factors. This shows that a commuta-tor [A,B] is orthogonal to both A† and B† under theHilbert-Schmidt inner product, and a Hermitean matrixH is orthogonal to its commutator with any matrix.

This means that it we restrict ourselves to the Hilbert-Schmidt vector space of Hermitean matrices, we obtainan interesting generalization of the standard dot andcross products for 3D vectors. Defining

A ·B ≡ (A,B), (72)

A×B ≡ i[A,B], (73)

we see that these operations satisfy all the same prop-erties as their familiar 3D analogs: the scalar (dot)product is symmetric (B · A = tr B†A = tr AB† =A · B), while the vector (cross) product is antisym-metric (A × B = B × A), orthogonal to both factors

22

([A×B] ·A = [A×B] ·B = 0), and produces a result ofthe same type as the two factors (a Hermitean matrix).

In this notation, the products of an arbitrary Her-mitean matrix A with the identity matrix I are

I ·A = tr A, (74)

I×A = 0, (75)

and the Schrodinger equation ρ = i[H, ρ] becomes simply

ρ = H× ρ. (76)

Just as in the 3D vector analogy, we can think of thisas generating rotation of the vector ρ that preserves itslength:

d

dt||ρ||2 =

d

dtρ · ρ = 2ρ · ρ = 2(H× ρ) · ρ = 0. (77)

A simple and popular way of quantifying whether evo-lution is non-unitary is to compute the linear entropy

Slin ≡ 1− tr ρ2 = 1− ||ρ||2, (78)

and repeatedly differentiating equation (78) tells us that

Slin = −2ρ · ρ, (79)

Slin = −2(||ρ||2 + ρ · ρ), (80)...S

lin= −6ρ · ρ− 2ρ ·

...ρ . (81)

Substituting equations (67) and (69) into equations (79)and (80) for ρ1, we find that almost all terms cancel,leaving us with the simple result

Slin1 = 0, (82)

Slin1 = 2 tr {ρ1 tr

2[H3, [H3, ρ]]} − 2||[H∗, ρ1]||2. (83)

This means that, to second order in time, the entropyproduction is completely independent of H1 and H2, de-pending only on quadratic combinations of H3, weightedby quadratic combinations of ρ. We find analogous re-sults for the Shannon entropy S: If the density matrixis initially separable, then S1 = 0 and S1 depends noton the full Hamiltonian H, but only on its non-separablecomponent H3, quadratically.

We now have the tools we need to compute the auton-omy of our “diagonal-sliding” system from the previoussubsection. As a simple example, let us take H1 to beour Hamiltonian from equation (60) with its equispacedenergy spectrum, with n = 2b, so that we can view theHilbert space as that of b coupled qubits. Equation (63)then gives an energy coherence

δH ≈ ~ω√12

2b, (84)

so the probability velocity grows exponentially with thesystem size b.

We augment this Hilbert space with one additional“environment” qubit that begins in the state |↑〉, with

internal dynamics given by H2 = ~ω2σx, and couple itto our subsystem with an interaction

H3 = V (x)⊗ σx (85)

for some potential V ; x is the position operator fromequation (62). As a first example, we use the sinusoidalpotential V (x) = sin(2πx/n), start the first subsystemin the position eigenstate |x1〉 and compute the linearentropy Slin

1 (t) numerically.As expected from our qualitative arguments of the pre-

vious section, Slin1 (t) grows only very slowly, and we find

that it can be accurately approximated by its Taylor ex-pansion around t = 0 for many orbital periods T ≡ 2π/ω:

Slin1 (t) ≈ Slin

1 (0) t2/2, where Slin1 (0) is given by equa-

tion (83). Figure 15 shows the linear entropy after oneorbit, Slin

1 (T ), as a function of the number of qubits bin our subsystem (top curve in top panel). Whereasequation (85) showed that the dynamics increases expo-nentially with system size (as 2b), the figure shows thatSlin1 (T ) decreases exponentially with system size, asymp-

totically falling as 2−4b as b→∞.Let us define the dynamical timescale τdyn and the in-

dependence timescale τind as

τdyn =~δH

, (86)

τind = [Slin1 (0)]−1/2. (87)

Loosely speaking, we can think of τdyn as the time oursystem requires to perform an elementary informationprocessing operation such as a bit flip [7], and τind as thetime it takes for the linear entropy to change by of orderunity, i.e., for significant information exchange with theenvironment to occur. If we define the autonomy A asthe ratio

A ≡ τindτdyn

, (88)

the autonomy of our subsystem thus grows exponen-tially with system size, asymptotically increasing as A ∝22b/2−b = 23b as b→∞.

As illustrated by Figure 12, we expect this exponen-tial scaling to be quite generic, independent of interactiondetails: the origin of the exponential is simply that thesize of the round dot in the figure is of order 2b timessmaller than the size of the square representing the fulldensity matrix. The independence timescale τind is ex-ponentially large because the dot, with its non-negligibleelements ρij , is exponentially close to the diagonal. Thedynamics timescale τdyn is exponentially small because itis roughly the time it takes the dot to traverse its own di-ameter as it moves around at some b-independent speedin the figure.

This exponential increase of autonomy with systemsize makes it very easy to have highly autonomous sys-tems even if the magnitude H3 of the interaction Hamil-tonian is quite large. Although the environment contin-ually “measures” the position of the subsystem through

23

the strong coupling H3, this measurement does not de-cohere the subsystem because it is (to an exponen-tially good approximation) a non-demolition measure-ment, with the subsystem effectively in a position eigen-state. This phenomenon is intimately linked to the quan-tum Darwinism paradigm developed by Zurek and collab-orators [27], where the environment mediates the emer-gence of a classical world by acting as a witness, stor-ing large numbers of redundant copies of informationabout the system state in the basis that it measures. Wethus see that systems that have high autonomy via the“diagonal-sliding” mechanism are precisely objects thatdominate quantum Darwinism’s “survival of the fittest”by proliferating imprints of their states in the environ-ment.

4

Entro

py in

crea

se d

urin

g fir

st o

rbit

10-6

1

10-2

10-4

10-8

10-6

10-2

10-4

10-8

10-10

System qubits2 5 104 63 987

Sinusoidal potential

Gaussian potential

α = 0

α = 2

α = 3

α = 4

α = 1

α = 0

α = 1

Entro

py in

crea

se d

urin

g fir

st o

rbit

FIG. 15: The linear entropy increase during the first orbit,Slin1 (2π/ω), is plotted for as a function of the subsystem

size (number of qubits b). The interaction potential V (x)is sinusoidal (top) and Gaussian (bottom), and the differentapodization schemes used to select the initial state are labeledby their corresponding α-value, where α = 0 corresponds tono apodization (the initial state being a position eigenstate).Some lines have been terminated in the bottom panel due toinsufficient numerical precision.

E. Boosting autonomy with optimized wavepackets

In our worked example above, we started our subsys-tem in a position eigenstate |x1〉, which cyclically evolvedthough all other position eigenstates. The slight decoher-ence that did occur thus originated during the times whenthe state was between eigenstates, in a coherent super-positions of multiple eigenstates quantified by the mostwiggly curve in Figure 14. Not surprisingly, these wiggles(and hence the decoherence) can be reduced by a better

choice of initial state |ψ〉 =∑k ψk|xk〉 =

∑k ψk|Ek〉 for

our subsystem, where ψk and ψk are the wavefunctionamplitudes in the position and energy bases, respectively.Equation (65) then gets generalized to

gn(φ) ≡ 〈x1|eiHφω |ψ〉 = e−i

n−12 φ

n−1∑k=0

ψkeikφ. (89)

Let us choose the initial state |ψ〉 that minimizes thequantity ∫ π

−π|gn(θ)|2w(θ)dθ (90)

for some penalty function w(θ) that punishes states giv-ing large unwanted |g(θ)| far from θ = 0. This givesa simple quadratic minimization problem for the vec-

tor of coefficients ψk, whose solution turns out to be thelast (with smallest eigenvalue) eigenvector of the Toeplitzmatrix whose first row is the Fourier series of w(θ). Aconvenient choice of penalty function 1− cosφ (see Fig-ure 14), which respects the periodicity of the problemand grows quadratically around its φ = 0 minimum. Inthe n → ∞ limit, the Toeplitz eigenvalue problem sim-

plifies to Laplace’s equation with a ψ(φ) = cos φ2 winningeigenvector, giving

ψk ≡∫ π

−πcos(kφ)φ(φ)dφ =

cos(πk)

1− 4k2. (91)

The corresponding curve gn(φ) is plotted is Figure 14,and is seen to have significantly smaller wiggles awayfrom the origin at the cost of a very slight widening of thecentral peak. Figure 15 (top panel, lower curve) showsthat this choice significantly reduces decoherence.

What we have effectively done is employ the standardsignal processing technique known as apodization. Asidefrom the irrelevant phase factor, equation (89) is simply

the Fourier transform of ψ, which can be made narrower

by making ψ smoothly approach zero at the two end-points. In the n → ∞ limit, our original choice corre-

sponded to ψ = 1 for −π ≤ φ ≤ π, which is discontin-

uous, whereas our replacement function ψ = cos φ2 van-ishes at the endpoints and is continuous. This reducesthe wiggling because Riemann-Lebesgue’s lemma impliesthat the Fourier transform of a function whose first dderivatives are continuous falls off faster than k−d. By

24

instead using ψ(α)(φ) = (cos φ2 )α for some integer α ≥ 0,we get α continuous derivatives, so the larger we chooseα, the smaller the decoherence-inducing wiggles, at thecost of widening the central peak. The first five casesgive

ψ(0)k = δ0k, (92)

ψ(1)k =

cos(πk)

1− 4k2, (93)

ψ(2)k = δ0k +

1

2δ1,|k|, (94)

ψ(3)k =

cos(πk)

(1− 4k2)(1− 49k

2), (95)

ψ(4)k = δ0k +

2

3δ1,|k| +

1

6δ2,|k|, (96)

and it is easy to show that the α→∞ limit correspondsto a Gaussian shape.

Which apodization is best? This depends on the in-teraction H3. For our sinusoidal interaction potential(Figure 15, top), the best results are for α = 1, whenthe penalty function has a quadratic minimum. Whenswitching to the roughly Gaussian interaction potentialV (x) ∝ e4 cos(2πx/n) (Figure 15, bottom), the resultsare instead seen to keep improving as we increase α, pro-ducing dramatically less decoherence than for the sinu-soidal potential, and suggesting that the optical choiceis the α → ∞ state: a Gaussian wave packet. Gaussianwave packets have long garnered interest as models of ap-proximately classical states. They correspond to general-ized coherent states, which have shown to be maximallyrobust toward decoherence in important situations in-volving harmonic oscillator interactions [29]. They havealso been shown to emerge dynamically in harmonic os-cillator environments, from the accumulation of manyindependent interactions, in much the same way as thecentral limit theorem gives a Gaussian probability distri-bution to sums of many independent contributions [30].Our results suggest that Gaussian wave packets may alsoemerge as the most robust states towards decoherencefrom short-range interactions with exponential fall-off.

F. Optimizing autonomy when we can choose thestate: factorizable effective theories

Above we explored specific examples of highly au-tonomous systems, motivated by approximately classicalsystems that we find around us in nature. We found thatthere are combinations of ρ, H and Hilbert space factor-ization that provide excellent autonomy even when theinteraction H3 is not small. We will now see that, moregenerally, given any H and factorization, there are statesρ that perfect factorization and infinite autonomy. Thebasic idea is that for states such that some of the spec-tral density invariants pk vanish, it makes no differenceif we replace the corresponding unused eigenvalues of Hby others to make the Hamiltonian separable.

Consider a subspace of the full Hilbert space defined bya projection operator Π. A projection operator satisfiesΠ2 = Π = Π†, so its eigenvalues are all zero or one, andthe latter correspond to our subspace of interest. Letus define the symbol l to denote that operator equalityholds in this subspace. For example,

A−B l 0 (97)

means that

Π(A−B)Π = 0. (98)

Below will often chose the subspace to correspond to low-energy states, so the wave symbol in l is intended to re-mind us that equality holds in the long wavelength limit.

We saw that the energy spectral density pn of equa-tion (55) remains invariant under unitary time evolution,so any energy levels for which pn = 0 will never haveany physical effect, and the corresponding dimensions ofthe Hilbert space can simply be ignored as “frozen out”.This remains true even considering observation-relatedstate projection as described in the next subsection. Letus therefore define

Π =∑k

θ(pn)|En〉〈En|, (99)

where θ is the Heaviside step function (θ(x) = 1 if x > 0,vanishing otherwise) i.e., summing only over those en-ergy eigenstates for which the probability pn is non-zero.Defining new operators in our subspace by

ρ′ ≡ ΠρΠ, (100)

H′ ≡ ΠHΠ, (101)

(102)

equation (99) implies that

ρ′ =∑mn

θ(pm)θ(pn)|Em〉〈Em|ρ|En〉〈En|

=∑mn

|Em〉〈Em|ρ|En〉〈En| = ρ, (103)

Here the second equal sign follows from the fact that|〈Em|ρ|En〉|2 ≤ 〈Em|ρ|Em〉〈En|ρ|En〉10, so that the lefthand side must vanish whenever either pm or pn vanishes— the Heaviside step functions therefore have no effectin equation (103) and can be dropped.

Although H′ 6= H, we do have H′ l H, and this meansthat the time-evolution of ρ can be correctly computedusing H′ in place of the full Hamiltonian H:

ρ(t) = Πρ(t)Π = ΠeiHtΠρ(0)Πe−iHtΠ = eiH′tρ(0)e−iH

′t.

10 This last inequality follows because ρ is Hermitean and positivesemidefinite, so the determinant must be non-negative for the2× 2 matrix 〈Ei|ρ|Ej〉 where i and j each take the two values kand l.

25

The frozen-out part of the Hilbert space is therefore com-pletely unobservable, and we can act as though the sub-space is the only Hilbert space that exists, and as if H′

is the true Hamiltonian. By working only with ρ′ and H′

restricted to the subspace, we have also simplified thingsby reducing the dimensionality of these matrices.

Sometimes, H′ can possess more symmetry than H.Sometimes, H′ can be separable even if H is not:

H l H′ = H1 ⊗ I + I⊗H2 (104)

To create such a situation for an arbitrary n×n Hamilto-nian, where n = n1n2, simply pick a state ρ such that thespectral densities pk vanish for all except n1 +n2− 1 en-ergy eigenvectors. This means that in the energy eigenba-sis, with the eigenvectors sorted to place these n1+n2−1special ones first, ρ is a block-diagonal matrix vanishingoutside of the upper left (n1+n2−1)×(n1+n2−1) block.Equation (54) shows that ρ(t) will retain this block formfor all time, and that changing the energy eigenvalues Ekwith k > n1 +n2− 1 leaves the time-evolution of ρ unaf-fected. We can therefore choose these eigenvalues so thatH becomes separable. For example, for the case wherethe Hilbert space dimensionality n = 9, suppose that pkvanishes for all energies except E0, E1, E2, E3, E4, andadjust the irrelevant zero-point energy so that E0 = 0.Then define H′ whose 9 eigenvalues are 0 E1 E2

E3 E1 + E3 E2 + E3

E4 E1 + E4 E2 + E4

. (105)

Note that H′ l H, and that although H is genericallynot separable, H′ is separable, with subsystem Hamilto-nians H′1 = diag {0, E1, E2} and H′2 = diag {0, E3, E4}.Subsystems 1 and 2 will therefore evolve as a paralleluniverses governed by H′1 and H′1, respectively.

G. Minimizing quantum randomness

When we attempted to maximize the independence fora subsystem above, we implicitly wanted to maximizethe ability to predict the subsystems future state fromits present state. The source of unpredictability thatwe considered was influence from outside the subsystem,from the environment, which caused decoherence and in-creased subsystem entropy.

Since we are interested in modeling also conscious sys-tems, there is a second independent source of unpre-dictability that we need to consider, which can occur evenif there is no interaction with the environment: “quantumrandomness”. If the system begins in a single consciousstate and unitarily evolves into a superposition of subjec-tively distinguishable conscious states, then the observerin the initial state has no way of uniquely predicting herfuture perceptions.

A comprehensive framework for treating such situa-tions is given in [31], and in the interest of brevity, we

will not review it here, merely use the results. To be ableto state them as succinctly as possible, let us first intro-duce notation for a projection process “pr ” that is in asense dual to partial-tracing.

For a Hilbert space that is factored into two parts,we define the following notation. We indicate the tensorproduct structure by splitting a single index α into an in-dex pair ij. For example, if the Hilbert space is the ten-sor product of an m-dimensional and an n-dimensionalspace, then α = n(i − 1) + j, i = 1, ...,m, j = 1, ..., n,α = 1, ...,mn, and if A = B⊗C, then

Aαβ = Aii′jj′ = BijCi′j′ . (106)

We define ? as the operation exchanging subsystems 1and 2:

(A?)ii′jj′ = Aii′j′j (107)

We define pr kA as the kth diagonal block of A:

(prk

A)ij = Akikj

For example, pr 1A is the m×m upper left corner of A.As before tr iA, denotes the partial trace over the ith

subsystem:

(tr1

A)ij =∑k

Akikj (108)

(tr2

A)ij =∑k

Aikjk (109)

The following identities are straightforward to verify:

tr1

A? = tr2

A (110)

tr2

A? = tr1

A (111)

tr1

A =∑k

prk

A (112)

tr2

A =∑k

prk

A? (113)

tr prk

A = (tr2

A)kk (114)

tr prk

A? = (tr1

A)kk (115)

Let us adopt the framework of [31] and decomposethe full Hilbert space into three parts corresponding tothe subject (the conscious degrees of freedom of the ob-server), the object (the external degrees of freedom thatthe observer is interested in making predictions about)and the environment (all remaining degrees of freedom).

If the subject knows the object-environment densitymatrix to be ρ, it obtains its density matrix for the objectby tracing out the environment:

ρo = treρ.

If the subject-object density matrix is ρ, then the sub-ject may be in a superposition of having many different

26

perceptions |sk〉. Take the |sk〉 to form a basis of thesubject Hilbert space. The probability that the subjectfinds itself in the state |sk〉 is

pk = (tr2ρ)kk, (116)

and for a subject finding itself in this state |sk〉, the objectdensity matrix is

ρ(k)o =pr k ρ

pk. (117)

If ρ refers to a future subject-object state, and the sub-ject wishes to predict its future knowledge of the object,it takes the weighted average of these density matrices,obtaining

ρo =∑k

pkρ(k)o =

∑k

prkρ = tr

sρ,

i.e., it traces out itself! (We used the identity equa-tion (112) in the last step.) Note that this simple resultis independent of whatever basis is used for the object-space, so all issues related to how various states are per-ceived become irrelevant.

As proven in [32], any unitary transformation of a sep-arable ρ will increase the entropy of tr 1ρ. This meansthat the subject’s future knowledge of ρo is more un-certain than its present knowledge thereof. However, asproven in [31], the future subject’s knowledge of ρo willon average be less uncertain than it presently is, at least ifthe time-evolution is restricted to be of the measurementtype.

The result ρo = tr 1 ρ also holds if you measure theobject and then forget what the outcome was. In thiscase, you are simply playing the role of an environment,resulting in the exact same partial-trace equation.

In summary, for a conscious system to be able to pre-dict the future state of what it cares about (ρo) as wellas possible, we must minimize uncertainty introducedboth by the interactions with the environment (fluctua-tion, dissipation and decoherence) and by me (“quantumrandomness”). The future evolution can be better pre-dicted for certain object states than for others, becausethey are more stable against both of the above-mentionedsources of unpredictability. The utility principle from Ta-ble II suggests that it is precisely these most stable andpredictable states that conscious observers will perceive.The successful “predictability sieve” idea of Zurek andcollaborators [34] involves precisely this idea when thesource of unpredictability is environment-induced deco-herence, so the utility principle lets us generalize this ideato include the second unpredictability source as well: tominimize apparent quantum randomness, we should payattention to states whose dynamics let them remain rela-tively diagonal in the eigenbasis of the subject-object in-teraction Hamiltonian, so that our future observations ofthe object are essentially quantum non-demolition mea-surements.

H. Optimizing autonomy when the state is given

Let us now consider the case where both H and ρ aretreated as given, and we want to vary the Hilbert spacefactorization to attain maximal separability. H and ρtogether determine the full time-evolution ρ(t) via theSchrodinger equation, so we seek the unitary transforma-tion U that makes Uρ(t)U† as factorizable as possible.For a pure initial state, exact factorability is equivalentto ρ1(t) being pure, with ||ρ1|| = 1 and vanishing linearentropy Slin = 1−||ρ1(t)||2, so let us minimize the linearentropy averaged over a range of times. As a concreteexample, we minimize the function

f(U) ≡ 1− 1

m

m∑i=1

|| tr1

Uρ(ti)U†||2, (118)

using 9 equispaced times ti ranging from t = 0 and t = 1,a random 4×4 Hamiltonian H, and a random pure stateρ(0).

0 2 4 6 8 10

0.2

0.4

0.6

0.8

1.0

Time

Nor

m ||ρ

1||

New

factorization

Old factorization

FIG. 16: The Hilbert-Schmidt norm ||ρ1|| is plotted for a ran-dom pure-state 2-qubit system when factorizing the Hilbertspace in the original basis (black curve) and after a unitarytransformation optimized to keep ρ1 as pure as possible fort ≤ 1 (red/grey curve).

The result of numerically solving this optimizationproblem is shown in Figure 16, and we see that the newfactorization keeps the norm ||ρ1|| visually indistinguish-able from unity for the entire time period optimized for.The optimization reduced the average Shannon entropyover this period from S ≈ 1.1 bits to S = 0.0009 bits.

The reason that the optimization is so successful ispresumably that it by adjusting N = n2 − n21 − n22 =16 − 4 − 4 = 8 real parameters11 in U, it is able toapproximately zero out the first N terms in the Taylor

11 There are n2 parameters for U, but transformations within eachof the two subspaces have no effect, wasting n2

1 and n22 parame-

ters.

27

expansion of Slin(t), whose leading terms are given byequations (79)- (81). A series of similar numerical exper-iments indicated that such excellent separability couldgenerally be found as long as the number of time stepsti was somewhat smaller than the number of free pa-rameters N but not otherwise, suggesting that separa-bility can be extended over long time periods for largen. However, because we are studying only unitary evolu-tion here, neglecting the important projection effect fromthe previous section, it is unclear how relevant these re-sults are to our underlying goal. We have therefore notextended these numerical optimizations, which are quitetime-consuming, to larger n.

V. CONCLUSIONS

In this paper, we have explored two problems that areintimately related. The first problem is that of under-standing consciousness as a state of matter, “perceptro-nium”. We have focused not on solving this problem,but rather on exploring the implications of this view-point. Specifically, we have explored five basic principlesthat may distinguish conscious matter from other physi-cal systems: the information, integration, independence,dynamics and utility principles.

The second one is the physics-from-scratch problem:If the total Hamiltonian H and the total density ma-trix ρ fully specify our physical world, how do we ex-tract 3D space and the rest of our semiclassical worldfrom nothing more than two Hermitean matrices? Cansome of this information be extracted even from H alone,which is fully specified by nothing more than its eigen-value spectrum? We have focused on a core part of thischallenge which we have termed the quantum factoriza-tion problem: why do conscious observers like us perceivethe particular Hilbert space factorization correspondingto classical space (rather than Fourier space, say), andmore generally, why do we perceive the world around usas a dynamic hierarchy of objects that are strongly inte-grated and relatively independent?

These two problems go hand in hand, because a genericHamiltonian cannot be decomposed using tensor prod-ucts, which would correspond to a decomposition of thecosmos into non-interacting parts, so there is some op-timal factorization of our universe into integrated andrelatively independent parts. Based on Tononi’s work,we might expect that this factorization, or some gener-alization thereof, is what conscious observers perceive,because an integrated and relatively autonomous infor-mation complex is fundamentally what a conscious ob-server is.

A. Summary of findings

We first explored the integration principle, and foundthat classical physics allows information to be essentially

fully integrated using error-correcting codes, so that anysubset containing up to about half the bits can be re-constructed from the remaining bits. Information storedin Hopfield neural networks is naturally error-corrected,but 1011 neurons support only about 37 bits of integratedinformation. This leaves us with an integration paradox:why does the information content of our conscious expe-rience appear to be vastly larger than 37 bits? We foundthat generalizing these results to quantum informationexacerbated this integration paradox, allowing no morethan about a quarter of a bit of integrated information— and this result applied not only to Hopfield networksof a given size, but to the state of any quantum system ofany size. This strongly implies that the integration prin-ciple must be supplemented by at least one additionalprinciple.

We next explored the independence principle and theextent to which a Hilbert space factorization can decom-pose the Hamiltonian H (as opposed to the state ρ) intoindependent parts. We quantified this using projectionoperators in the Hilbert-Schmidt vector space where Hand ρ are viewed as vectors rather than operators, andconjectured that the best decomposition can always befound in the energy eigenbasis, where H is diagonal. Weproved this conjecture for the n = 4 case and found nu-merical evidence for it being true for all n. This leadsto a more pernicious variant of the Quantum Zeno Ef-fect that we termed the Quantum Zeno Paradox: if wedecompose our universe into maximally independent ob-jects, then all change grinds to a halt. Since consciousobservers clearly do not perceive reality as being staticand unchanging, the integration and independence prin-ciples must therefore be supplemented by at least oneadditional principle.

We then explored the dynamics principle, accordingto which a conscious system has the capacity to notonly store information, but also to process it. We found

the energy coherence δH ≡√

2 tr ρ2 to be a conve-nient measure of dynamics: it can be proven to be time-independent, and it reduces to the energy uncertainty∆H for the special case of pure states. Maximizing dy-namics alone gives boring periodic solutions unable tosupport complex information processing, but reducingδH by merely a modest percentage enables chaotic andcomplex dynamics that explores the full dimensionalityof the Hilbert space. We found that high autonomy (acombination of dynamics and independence) can be at-tained even if the environment interaction is strong. Oneclass of examples involve the environment effectively per-forming quantum-non-demolition measurements of theautonomous system, whose internal dynamics causes thenon-negligible elements of the density matrix ρ to “slidealong the diagonal” in the measured basis, remainingin the low-decoherence subspace. We studied such anexample involving a truncated harmonic oscillator cou-pled to an external spin, and saw that it is easy to findclasses of systems whose autonomy grows exponentiallywith the system size (measured in qubits). Generalized

28

coherent states with Gaussian wavefunctions appearedparticularly robust toward interactions with steep/short-range potentials. We found that any given H can also beperfectly decomposed given a suitably chosen ρ that as-signs zero amplitude to some energy eigenstates. Whenoptimizing the Hilbert space factorization for H and ρjointly, it appears possible to make a subsystem historyρ1(t) close to separable for a long time. However, it isunclear how relevant this is, because the state projectioncaused by observation also alters ρ1.

B. How does a conscious entity perceive the world?

What are we to make of these findings? We have notsolved the quantum factorization problem, but our re-sults have brought it into sharper focus, and highlightedboth concrete open sub-problems and various hints andclues from observation about paths forward. Let us firstdiscuss some open problems, then turn to the hints.

For the physics-from-scratch problem of deriving howwe perceive our world from merely H, ρ and theSchrodingier equation, there are two possibilities: eitherthe problem is well-posed or it is not. If not, this wouldbe very interesting, implying that some sort of additionalstructure beyond ρ and H is needed at the fundamen-tal level — some additional mathematical structure en-coding properties of space, for instance, which would besurprising given that this appears unnecessary in latticeGauge theory (see Appendix C). Since we have limitedour treatment to unitary non-relativistic quantum me-chanics, obvious candidates for missing structure relateto relativity and quantum gravity, where the Hamiltonianvanishes, and to mechanisms causing non-unitary wave-function collapse. Indeed, Penrose and others have spec-ulated that gravity is crucial for a proper understandingof quantum mechanics even on small scales relevant tobrains and laboratory experiments, and that it causesnon-unitary wavefunction collapse [35]. Yet the Occam’srazor approach is clearly the commonly held view thatneither relativistic, gravitational nor non-unitary effectsare central to understanding consciousness or how con-scious observers perceive their immediate surroundings:astronauts appear to still perceive themselves in a semi-classical 3D space even when they are effectively in a zero-gravity environment, seemingly independently of rela-tivistic effects, Planck-scale spacetime fluctuations, blackhole evaporation, cosmic expansion of astronomically dis-tant regions, etc.

If, on the other hand, the physics-from-scratch prob-lem is well-posed, we face crucial unanswered questionsrelated to Hilbert space factorization. Why do we per-ceive electromagnetic waves as transferring informationbetween different regions of space, rather than as com-pletely independent harmonic oscillators that each stayput in a fixed spatial location? These two viewpointscorrespond to factoring the Hilbert space of the elec-tromagnetic field in either real space or Fourier space,

which are simply two unitarily equivalent Hilbert spacebases. Moreover, how can we perceive a harmonic oscil-lator as an integrated system when its Hamiltonian can,as reviewed in Appendix B, be separated into completelyindependent qubits? Why do we perceive a magnetic sys-tem described by the 3D Ising model as integrated, whenit separates into completely independent qubits after aunitary transformation?12 In all three cases, the answerclearly lies not within the system itself (in its internaldynamics H1), but in its interaction H3 with the rest ofthe world. But H3 involves the factorization problem allover again: whence this distinction between the systemitself and the rest of the world, when there are countlessother Hilbert space factorizations that mix the two?

C. Open problems

Based on our findings, three specific problems standin the way of solving the quantum factorization problemand answering these questions, and we will now discusseach of them in turn.

1. Factorization and the chicken-and-egg problem

What should we determine first: the state or the fac-torization? If we are given a Hilbert space factorizationand an environment state, we can use the predictabilitysieve formalism [34] to find the states of our subsystemthat are most robust toward decoherence. In some sim-ple cases, they are eigenstates of the effective interactionHamiltonian H∗ from equation (68). However, to findthe best factorization, we need information about thestate. A clock is a highly autonomous system if we fac-tor the Hilbert space so that the first factor correspondsto the spatial volume containing the clock, but if thestate were different such that the clock were somewhereelse, we should factor out a different volume. Moreover,if the state has the clock in a superposition of two macro-scopically different locations, then there is no single op-timal factorization, but instead a separate one for eachbranch of the wavefunction. An observer looking at theclock would use the clock position seen to project ontothe appropriate branch using equation (117), so the solu-tion to the quantum factorization problem that we shouldbe looking for is not a single unique factorization of theHilbert space. Rather, we need a criterion for identifying

12 If we write the Ising Hamiltonian as a quadratic function ofσx-operators, then it is also quadratic in the annihilation andcreation operators and can therefore be diagonalized after aJordan-Wigner transform [33]. Note that such diagonalizationis impossible for the Heisenberg ferromagnet, whose couplingsare quadratic in all three Pauli matrices, because σ2

z -terms arequartic in the annihilation and creation operators.

29

conscious observers, and then a prescription that deter-mines which factorization each of them will perceive.

2. Factorization and the integration paradox

A second challenge that we have encountered is theextreme separability possible for both H and ρ. In theintroduction, we expressed hope that the apparent inte-gration of minds and external objects might trace back tothe fact that for generic ρ and H, there is no Hilbert spacefactorization that makes ρ factorizable or H additivelyseparable. Yet by generalizing Tononi’s ideas to quantumsystems, we found that what he terms the “cruelest cut”is very cruel indeed, able to reduce the mutual informa-tion in ρ to no more than about 0.25 bits, and typicallyable to make the interaction Hamiltonian H3 very smallas well. We saw in Section IV H that even the combinedeffects ρ and H can typically be made close to separable,in the sense that there is a Hilbert space factorizationwhere a subsystem history ρ1(t) is close to separable fora long time. So why do we nonetheless perceive out uni-verse as being relatively integrated, with abundant infor-mation available to us from near and far? Why do we notinstead perceive our mind as essentially constituting itsown parallel universe, solipsism-style, with merely expo-nentially small interactions with the outside world? Wesaw that the origin of this integration paradox is the vast-ness of the group of unitary transformations that we areminimizing over, whose number of parameters scales liken2 = 22b with the number of qubits b and thus grows ex-ponentially with system size (measured in either volumeor number of particles).

3. Factorization and the emergence of time

A third challenge involves the emergence of time. Al-though this is a famously thorny problem in quantumgravity, our results show that it appears even in non-relativistic unitary quantum mechanics. It is intimatelylinked with our factorization problem, because we areoptimizing over all unitary transformations U, and timeevolution is simply a one-dimensional subset of thesetransformations, given by U = eiHt. Should the opti-mal factorization be determined separately at each time,or only once and for all? In the latter case, this wouldappear to select only one special time when our universeis optimally separable, seemingly contrary to our obser-vations that the laws of physics are time-translation in-variant. In the former case, the continuous change infactorization will simply undo time evolution [9], makingyou feel that time stands still! Observationally, it is ob-vious that the optimal factorization can change at leastsomewhat with time, since our designation of objects istemporary: the atoms of a highly autonomous woodenbowling ball rolling down a lane were once dispersed (asCO2 and H2O in the air, etc.) and will eventually disperse

again.

An obvious way out of this impasse is to bring con-sciousness back to center-stage as in Section IV G and[26, 31, 32]. Whenever a conscious observer interactswith her environment and gains new information, thestate ρ with which she describes her world gets updatedaccording to equation (117), the quantum-mechanicalversion of Bayes Theorem [32]. This change in her ρis non-unitary and therefore evades our timelessness ar-gument above. Because she always perceives herself ina pure state, knowing the state of her mind, the jointstate or her and the rest of the world is always separa-ble. It therefore appears that if we can one day solve thequantum factorization problem, then we will find that theemergence of time is linked to the emergence of conscious-ness: the former cannot be fully understood without thelatter.

D. Observational hints and clues

In summary, the quantum factorization problem isboth very interesting and very hard. However, as op-posed to the hard problem of quantum gravity, say, wherewe have few if any observational clues to guide us, physicsresearch has produced many valuable hints and clues rel-evant to the quantum factorization problem. The factor-ization of the world that we perceive and the quantumstates that we find objects in have turned out to be ex-ceptionally unusual and special in various ways, and foreach such way that we can identify, quantify and under-stand the underlying principle responsible for, we willmake another important stride towards solving the fac-torization problem. Let us now discuss the hints that wehave identified upon so far.

1. The universality of the utility principle

The principles that we listed in Table II were for con-scious systems. If we shift attention to non-consciousobjects, we find that although dynamics, independenceand integration still apply in many if not most cases, theutility principle is the only one that universally appliesto all of them. For example, a rain drop lacks significantinformation storage capacity, a boulder lacks dynamics,a cogwheel can lack independence, and a sand pile lacksintegration. This universality of the utility principle ishardly surprising, since utility is presumably the reasonwe evolved consciousness in the first place. This suggeststhat we examine all other clues below through the lens ofutility, to see whether the unusual circumstances in ques-tion can be explained via some implication of the utilityprinciple. In other words, if we find that useful conscious-ness can only exist given certain strict requirements onthe quantum factorization, then this could explain whywe perceive a factorization satisfying these requirements.

30

2. ρ is exceptional

The observed state ρ of our universe is excep-tional in that it is extremely cold, with most of theHilbert space frozen out — what principles might re-quire this? Perhaps this is useful for consciousnessby allowing relatively stable information storage andby allowing large autonomous systems thanks to thelarge available dynamic range in length scales (uni-verse/brain/atom/Planck scale)? Us being far from ther-mal equilibrium with our 300K planet dumping heat fromour 6000K sun into our 3K space is clearly conducive todynamics and information processing.

3. H is exceptional

The Hamiltonian H of the standard model of particlephysics is of the very special form

H =

∫Hr(r)d3r, (119)

which is seen to be almost additively separable in thespatial basis, and in no other basis. Although equa-tion (119) superficially looks completely separable justas H =

∑i Hi, there is a coupling between infinitesi-

mally close spatial points due to spatial derivatives inthe kinetic terms. If we replace the integral by a sum inequation (119) by discretizing space as in lattice gaugetheory, we need couplings only between nearest-neighborpoints. This is a strong hint of the independence princi-ple at work; all this near-independence gets ruined by ageneric unitary transformation, making the factorizationcorresponding to our 3D physical space highly special;indeed, 3D space and the exact form of equation (119)could presumably be inferred from simply knowing thespectrum of H.

H from equation (119) is also exceptional in that itcontains mainly quadratic, cubic and quartic functionsof the fermion and boson fields, which can in turn beexpressed linearly or quadratically in terms of qubit rais-ing and lowering operators (see Appendix C). A genericunitary transformation would ruin this simplicity as well,introducing polynomials of enormous degree. What prin-ciple might be responsible for this?

4. The ubiquity of autonomy

When discussing the integration paradox above, weworried about factorizations splitting the world intonearly independent parts. If there is a factorization withH3 = 0, then the two subsystems are independent for anystate, for all time, and will act as two parallel universes.This means that if the only way to achieve high inde-pendence were to make H3 tiny, the integration paradoxwould indeed be highly problematic. However, we saw in

Section IV that this is not at all the case: it it quite easyto achieve high independence for some states, at leasttemporarily, even when H3 is large. The independenceprinciple therefore does not push us inexorably towardsperceiving a more disconnected world than the one we arefamiliar with. The ease of approximately factoring ρ1(t)during a significant time period as in Section IV H alsoappears unlikely to be a problem: as mentioned, our cal-culation answered the wrong question by studying onlyunitary evolution, neglecting projection. The take-awayhint is thus that observation needs to be taken into ac-count to address this issue properly, just as we arguedthat it must be taken into account to understand theemergence of time.

5. Decoherence as enemy

Early work on decoherence [20, 21] portrayed it mainlyas an enemy, rapidly killing off most quantum states,with only a tiny minority surviving long enough to beobservable. For example, a bowling ball gets struck byabout 1025 air molecules each second, and a single strikesuffices to ruin any macrosuperposition of the balls po-sition extending further than about an angstrom, themolecular De Broigle wavelength [21, 36]. The successfulpredictability sieve idea of Zurek and collaborators [34]states that we will only perceive those quantum statesthat are most robust towards decoherence, which in thecase of macroscopic objects such as bowling balls selectsroughly classical states with fairly well-defined positions.The origin of the position basis as special thus traces backto the environmental interactions H3 (with air moleculesetc.) probing the position, which in turn traces back tothe fact that H from equation (119) is roughly separa-ble in the position basis. In terms of Table II, we canview the predictability sieve as an application of the util-ity principle, since there is clearly no utility in trying toperceive something that will be irrelevant 10−25 secondslater. In summary, the hint from this negative view ofdecoherence is that we should minimize it, either by fac-toring to minimize H3 itself or by using robust states onwhich H3 essentially performs quantum non-demolitionmeasurements.

6. Decoherence as friend

Although quantum computer builders still view deco-herence as their enemy, more recent work on decoher-ence has emphasized that it also has a positive side: theQuantum Darwinism framework [27] emphasizes the roleof environment interactions H3 as a valuable communica-tion channel, repeatedly copying information about the

31

states of certain systems into the environment13, therebyhelping explain the emergence of a consensus reality [37].Quantum Darwinism can also be viewed as an applica-tion of the utility principle: it is only useful for us tobe aware of things that we can get information about,i.e., about states that have quantum-spammed the en-vironment with redundant copies of themselves. A hintfrom this positive view of environmental interactions isthat we should not try to minimize H3 after all, butshould instead reduce decoherence by the second mech-anism: using states that are approximate eigenstates ofthe effective interaction H∗ and therefore get abundantlycopied into the environment.

Further work on Quantum Darwinism has revealedthat such situations are quite exceptional, reaching thefollowing conclusion [38]: “A state selected at randomfrom the Hilbert space of a many-body system is over-whelmingly likely to exhibit highly non-classical correla-tions. For these typical states, half of the environmentmust be measured by an observer to determine the stateof a given subsystem. The objectivity of classical reality— the fact that multiple observers can agree on the stateof a subsystem after measuring just a small fraction ofits environment — implies that the correlations found innature between macroscopic systems and their environ-ments are very exceptional.” This gives a hint that theparticular Hilbert space factorization we observe mightbe very special and unique, so that using the utility prin-ciple to insist on the existence of a consensus reality mayhave large constraining power among the factorizations

— perhaps even helping nail down the one we actuallyobserve.

E. Outlook

In summary, the hypothesis that consciousness can beunderstood as a state of matter leads to fascinating in-terdisciplinary questions spanning the range from neu-roscience to computer science, condensed matter physicsand quantum mechanics. Can we find concrete exam-ples of error-correcting codes in the brain? Are therebrain-sized non-Hopfield neural networks that supportmuch more than 37 bits of integrated information? Cana deeper understanding of consciousness breathe new lifeinto the century-old quest to understand the emergenceof a classical world from quantum mechanics, and can iteven help explain how two Hermitean matrices H and ρlead to the subjective emergence of time? The quests tobetter understand the internal reality our mind and theexternal reality of our universe will hopefully assist oneanother.

Acknowledgments: The author wishes to thankChristoph Koch, Meia Chita-Tegmark, Hrant Gharibyan,Seth Lloyd, Bill Poirier and Harold Shapiro, MarinSoljacic and for helpful discussions. This work was sup-ported by NSF AST-090884 & AST-1105835.

13 Charles Bennett has suggested that Quantum Darwinism wouldbe more aptly named “Quantum Spam”, since the many redun-dant imprints of the system’s state are normally not further re-

produced.

[1] P. Hut, M. Alford, and M. Tegmark, Found. Phys. 36,765 (2006, physics/0510188).

[2] M. Tegmark, Found.Phys. 11/07, 116 (2007).[3] G. Tononi, Biol. Bull. 215, 216, http://www.biolbull.

org/content/215/3/216.full (2008).[4][5] G. Tononi, Phi: A Voyage from the Brain to the Soul

(New York, Pantheon, 2012).[6] I. Amato, Science 253, 856 (1991).[7] S. Lloyd, Nature 406, 1-47 (2000).[8] G. t’Hooft, arXiv:gr-qc/9310026 (1993).[9] J. Schwindt, arXiv:1210.8447 [quant-ph] (2012).

[10] A. Damasio, Self Comes to Mind: Constructing the Con-scious Brain (New York, Vintage, 2010).

[11] R. W. Hamming, The Bell System Technical Journal 24,2 (1950).

[12] M. Grassl, http://i20smtp.ira.uka.de/home/grassl/

codetables/

[13] J. J. Hopfield, Proc. Natl. Acad. Sci. 79, 2554 (1982).[14] N. J. Joshi, G. Tononi, and C. Koch, PLOS Comp. Bio.

9, e1003111 (2013).[15] O. Barak et al., Progr. Neurobio. 103, 214 (2013).[16] Yoon K et al., Nature Neuroscience 16, 1077 (2013).[17] D. J. C McKay, Information Theory, Inference, and

Learning Algorithms (Cambridge, Cambridge UniversityPress, 2003).

[18] J. von Neumann., Die mathematischen Grundlagen derQuantenmechanik (Berlin., Springer, 1932).

[19] W. H. Zurek, quant-ph/0111137 (2001).[20] H. D. Zeh, Found.Phys. 1, 69 (1970).[21] E. Joos and H. D. Zeh, Z. Phys. B 59, 223 (1985).[22] W. H. Zurek, S. Habib, and J. P. Paz, PRL 70, 1187

(1993).[23] D. Giulini, E. Joos, C. Kiefer, J. Kupsch, I. O. Sta-

matescu, and H. D. Zeh, Decoherence and the Appear-ance of a Classical World in Quantum Theory (Springer,Berlin, 1996).

[24] W. H. Zurek, Nature Physics 5, 181 (2009).[25] M. Schlosshauer, Decoherence and the Quantum-To-

Classical Transition (Berlin, Springer, 2007).[26] M. Tegmark, PRE 61, 4194 (2000).[27] W. H. Zurek, Nature Physics 5, 181 (2009).[28] E. C. G Sudarshan and B. Misra, J. Math. Phys. 18, 756

(1977).[29] W. H. Zurek, S. Habib, and J. P. Paz, PRL 70, 1187

(1993).[30] M. Tegmark and H. S. Shapiro, Phys. Rev. E 50, 2538

(1994).

http://www.biolbull.org/content/215/3/216.full

http://www.biolbull.org/content/215/3/216.full

http://i20smtp.ira.uka.de/home/grassl/codetables/

http://i20smtp.ira.uka.de/home/grassl/codetables/

32

[31] H. Gharibyan and M. Tegmark, arXiv:1309.7349 [quant-ph] (2013).

[32] M. Tegmark, PRD 85, 123517 (2012).[33] Nielsen 2005, http://michaelnielsen.org/blog/

archive/notes/fermions_and_jordan_wigner.pdf

[34] D. A. R Dalvit, J. Dziarmaga, and W. H. Zurek, PRA72, 062101 (2005).

[35] R. Penrose, The Emperor’s New Mind (Oxford, OxfordUniv. Press, 1989).

[36] T. Megmark, Found. Phys. Lett. 6, 571 (1993).[37] T. Megmark, Our Mathematical Universe: My Quest for

the Ultimate Nature of Reality (New York, Knopf, 2014).[38] C. J. Riedel, W. H. Zurek, and M. Zwolak, New J. Phys.

14, 083010 (2012).[39] S. Lloyd, Programming the Universe (New York, Knopf,

2006).[40] Z. Gu and X. Wen, Nucl.Phys. B 863, 90 (2012).[41] X. Wen, PRD 68, 065003 (2003).[42] M. A. Levin and X. Wen, RMP 77, 871 (2005).[43] M. A. Levin and X. Wen, PRB 73, 035122 (2006).[44] M. Tegmark and L. Yeh, Physica A 202, 342 (1994).

Appendix A: Useful identities involving tensorproducts

Below is a list of useful identities involving tensor mul-tiplication and partial tracing, many of which are usedin the main part of the paper. Although they are allstraightforward to prove by writing them out in the in-dex notation of equation (106), I have been unable tofind many of them in the literature. The tensor product

⊗ is also known as the Kronecker product.

(A⊗B)⊗C = A⊗ (B⊗C) (A1)

A⊗ (B + C) = A⊗B + A⊗C (A2)

(B + C)⊗A = B⊗A + C⊗A (A3)

(A⊗B)† = A† ⊗B† (A4)

(A⊗B)−1 = A−1 ⊗B−1 (A5)

tr [A⊗B] = (tr A)(tr B) (A6)

tr1

[A⊗B] = (tr A)B (A7)

tr2

[A⊗B] = (tr B)A (A8)

tr1

[A(B⊗ I)] = tr1

[(B⊗ I)A] (A9)

tr2

[A(I⊗B)] = tr2

[(I⊗B)A] (A10)

tr1

[(I⊗A)B] = A(tr1

B) (A11)

tr2

[(A⊗ I)B] = A(tr2

B) (A12)

tr1

[A(I⊗B)] = (tr1

A)B (A13)

tr2

[A(B⊗ I)] = (tr2

A)B (A14)

tr1

[A(B⊗C)] = tr1

[A(B⊗ I)]C (A15)

tr2

[A(B⊗C)] = tr2

[A(I⊗C)]B (A16)

tr1

[(B⊗C)A] = C tr1

[(A⊗ I)B] (A17)

tr2

[(B⊗C)A] = B tr2

[(I⊗C)A] (A18)

tr{

[(tr2

A)⊗ I]B}

= tr [(tr2

A)(tr2

B)] (A19)

tr{

[I⊗ (tr1

A)]B}

= tr [(tr1

A)(tr1

B)] (A20)

(A⊗B,C⊗D) = (A,C)(B,D) (A21)

||A⊗B|| = ||A|| ||B|| (A22)

Identities A11-A14 are seen to be special cases of iden-tities A15-A18. If we define the superoperators T1 andT2 by

T1A ≡ 1

n1I⊗ (tr 1A), (A23)

T2A ≡ 1

n2(tr 2A)⊗ I, (A24)

then identities A19-A20 imply that they are self-adjoint:

(T1A,B) = (A,T1B), (T2A,B) = (A,T2B).

They are also projection operators, since they satisfyT2

1 = T1 and T22 = T2.

Appendix B: Factorization of Harmonic oscillatorinto uncoupled qubits

If the Hilbert space dimensionality n = 2b for someinteger b, then the truncated harmonic oscillator Hamil-

http://michaelnielsen.org/blog/archive/notes/fermions_and_jordan_wigner.pdf

http://michaelnielsen.org/blog/archive/notes/fermions_and_jordan_wigner.pdf

33

tonian of equation (60) can be decomposed into b inde-pendent qubits: in the energy eigenbasis,

H =

b−1∑j=0

Hj , Hj = 2j(

12 00 − 1

2

)j

= 2j−1σzj , (B1)

where the subscripts j indicate that an operator acts onlyon the jth qubit, leaving the others unaffected. For ex-ample, for b = 3 qubits,

H =

(2 00 −2

)⊗ I⊗ I + I⊗

(1 00 −1

)⊗ I + I⊗ I⊗

(12 00 − 1

2

)

=

− 72 0 0 0 0 0 0 00 − 5

2 0 0 0 0 0 00 0 − 3

2 0 0 0 0 00 0 0 − 1

2 0 0 0 00 0 0 0 1

2 0 0 00 0 0 0 0 3

2 0 00 0 0 0 0 0 5

2 00 0 0 0 0 0 0 7

2

, (B2)

in agreement with equation (60). This factorization cor-responds to the standard binary representation of inte-gers, which is more clearly seen when adding back thetrace (n− 1)/2 = (2b − 1)/2:

H +7

2=

(4 00 0

)⊗ I⊗ I + I⊗

(2 00 0

)⊗ I + I⊗ I⊗

(1 00 0

)

=

0 0 0 0 0 0 0 00 1 0 0 0 0 0 00 0 2 0 0 0 0 00 0 0 3 0 0 0 00 0 0 0 4 0 0 00 0 0 0 0 5 0 00 0 0 0 0 0 6 00 0 0 0 0 0 0 7

. (B3)

Here we use the ordering convention that the most sig-nificant qubit goes to the left. If we write k as

k =b−1∑j=0

kj2j ,

where kj are the binary digits of k and take values 0 or1, then the energy eigenstates can be written

|Ek〉 =b−1⊗j=0

(σ†)kj |0〉, (B4)

where |0〉 is the ground state (all b qubits in the downstate), the creation operator

σ† =

(0 10 0

)raises a qubit from the down state to the up state, and(σ†)0 is meant to be interpreted as the identity matrix

I. For example, since the binary representation of 6 is“110”, we have

|E6〉 = σ† ⊗ σ† ⊗ I|0〉 = |110〉,

the state where the first two qubits are up and the lastone is down. Since (σ†)kj

(01

)is an eigenvector of σz with

eigenvalue (2kj − 1), i.e., +1 for spin up and −1 for spindown, equations (B1) and (B4) give H|Ek〉 = Ek|Ek〉,where

Ek =

b−1∑j=0

2j−1(2kj − 1)|Ek〉 = k − 2b − 1

2

in agreement with equation (60).The standard textbook harmonic oscillator corre-

sponds to the limit b → ∞, which remains completelyseparable. In practice, a number of qubits b = 200 islarge enough to be experimentally indistinguishable fromb = ∞ for describing any harmonic oscillator ever en-countered in nature, since it corresponds to a dynamicrange of 2200 ∼ 1060, the ratio between the largest andsmallest potentially measurable energies (the Planck en-ergy versus the energy of a photon with wavelength equalto the diameter of our observable universe). So far, wehave never measured any physical quantity to better than17 significant digits, corresponding to 56 bits.

Appendix C: Emergent space and particles fromnothing but qubits

Throughout the main body of our paper, we have lim-ited our discussion to a Hilbert space of finite dimension-ality n, often interpreting it as b qubits with n = 2b. Onthere other hand, textbook quantum mechanics usuallysets n =∞ and contains plenty of structure additional tomerely H and ρ, such as a continuous space and variousfermion and boson fields. The purpose of this appendixis to briefly review how the latter picture might emergefrom the former. An introduction to this “it’s all qubits”approach by one of its pioneers, Seth Lloyd, is given in[39], and an up-to-date technical review can be found in[40].

As motivation for this emergence approach, note that alarge number of quasiparticles have been observed such asphonons, holes, magnons, rotons, plasmons and polarons,which are known not to be fundamental particles, butinstead mere excitations in some underlying substrate.This raises the question of whether our standard modelparticles may be quasiparticles as well. It has been shownthat this is indeed a possibility for photons, electrons andquarks [41–43], and perhaps even for gravitons [40], withthe substrate being nothing more than a set of qubitswithout any space or other additional structure.

In Appendix B, we saw how to build a harmonic oscil-lator out of infinitely many qubits, and that a truncatedharmonic oscillator built from merely 200 qubits is exper-imentally indistinguishable from an infinite-dimensional

34

one. We will casually refer to such a qubit collection de-scribing a truncated harmonic oscillator as a “qubyte”,even if the number of qubits it contains is not precisely8. As long as our universe is cold enough that the veryhighest energy level is never excited, a qubyte will behaveidentically to a true harmonic oscillator, and can be usedto define position and momentum operators obeying theusual canonical commutation relations.

To see how space can emerge from qubits alone, con-sider a large set of coupled truncated harmonic oscillators(qubytes), whose position operators qr and momentumoperators pr are labeled by an index r = (i, j, k) consist-ing of a triplet of integers — r has no a priori meaningor interpretation whatsoever except as a record-keepingdevice used to specify the Hamiltonian. Grouping theseoperators into vectors p and q, we choose the Hamilto-nian

H =1

2|p|2 +

1

2qtAq, (C1)

where the coupling matrix A is translationally invariant,i.e., Arr′ = ar′−r, depending only on the difference r′−rbetween two index vectors. For simplicity, let us treatthe lattice of index vectors r as infinite, so that A isdiagonalized by a 3D Fourier transform. (Alternatively,we can take the lattice to be finite and the matrix A tobe circulant, in which case A is again diagonalized by aFourier transform; this will lead to the emergence of atoroidal space.)

Fourier transforming our qubyte lattice preserves thecanonical commutation relations and corresponds to aunitary transformation that decomposes H into indepen-dent harmonic oscillators. As in [44], the frequency of theoscillator corresponding to wave vector κ is

ω(k)2 =∑r

are−iκ·r. (C2)

For example, consider the simple case where each oscilla-tor has a self-coupling µ and is only coupled to its 6 near-est neighbors by a coupling γ: a1,0,0 = a−1,0,0 = a0,1,0 =a0,−1,0 = a0,0,1 = a0,0,−1 = −γ2, a0,0,0 = µ2+6γ2. Then

ω(κ)2 = µ2 + 4γ2(

sin2 κx2

+ sin2 κy2

+ sin2 κz2

), (C3)

where κx, κy and κz lie in the interval [π, π]. If wewere to interpret the lattice points as existing in a three-dimensional space with separation a between neighboringlattice points, then the physical wave vector k would begiven by

k =κ

a. (C4)

Let us now consider a state ρ where all modes exceptlong-wavelength ones with |κ| � 1 are frozen out, in thespirit of our own relatively cold universe. Using the lsymbol from Section IV F, we then have H l H′, whereH′ is a Hamiltonian with the isotropic dispersion relation

ω2 = µ2 + γ2(κ2x + κ2y + κ2z

)= µ2 + γ2κ2, (C5)

i.e., where the discreteness effects are absent. Comparingthis with the standard dispersion relation for a relativisticparticle,

ω2 = µ2 + (ck)2, (C6)

where c is the speed of light, we see that the two agree ifthe lattice spacing is

a =c

γ. (C7)

For example, if the lattice spacing is the Planck length,then the coupling strength γ is the inverse Planck time.In summary, this Hilbert built out of qubytes, with nostructure whatsoever except for the Hamiltonian H, isphysically indistinguishable from a system with quan-tum particles (scalar bosons of mass µ) propagating ina continuous 3D space with the same translational androtational symmetry that we normally associate withinfinite Hilbert spaces, so not only did space emerge,but continuous symmetries not inherent in the originalqubit Hamiltonian emerged as well. The 3D structure ofspace emerged from the pattern of couplings between thequbits: if they had been presented in a random order, thegraph of which qubits were coupled could have been ana-lyzed to conclude that everything could be simplified intoa 3D rectangular lattice with nearest-neighbor couplings.

Adding polarization to build photons and other vec-tor particles is straightforward. Building simple fermionfields using qubit lattices is analogous as well, except thata unitary Jordan-Wigner transform is required for con-verting the qubits to fermions. Details on how to buildphotons, electrons, quarks and perhaps even gravitonsare given in [40–43]. Lattice gauge theory works simi-larly, except that here, the underlying finite-dimensionalHilbert space is viewed not as the actual truth but asan numerically tractable approximation to the presumedtrue infinite-dimensional Hilbert space of quantum fieldtheory.

Consciousness as a State of Matter - arXiv · Consciousness as a State of Matter Max Tegmark Dept....

Documents

Transcript of Consciousness as a State of Matter - arXiv · Consciousness as a State of Matter Max Tegmark Dept....