Quantum Wasserstein GANs - University Of Marylandxwu/papers/neurips_2019_poster.pdf ·...

1
Quantum Wasserstein GANs Shouvanik Chakrabarti * , Yiming Huang * , Tongyang Li, Soheil Feizi, and Xiaodi Wu Department of Computer Science, Institute for Advanced Computer Studies, and Joint Center for Quantum Information and Computer Science, University of Maryland Joint Center for Quantum Information and Computer Science arXiv:1911.00111 NeurIPS 2019 Abstract We propose the first design of quantum Wasserstein Gen- erative Adversarial Networks (WGANs), which has been shown to improve the robustness and the scalability of the adversarial training of quantum generative models even on noisy quantum hardware. Specifically, we pro- pose a definition of the Wasserstein semimetric between distributions over quantum data. We also demonstrate how to turn the quantum Wasserstein semimetric into a concrete design of quantum WGANs that can be effi- ciently implemented on quantum machines. We use our quantum WGAN to generate a 3-qubit quantum circuit of 50 gates that well approximates a 3-qubit 1-d Hamil- tonian simulation circuit that requires over 10k gates us- ing standard techniques. Motivation The first quantum computers will be small scale and noisy systems. Their practical applications will include the train- ing parameterized quantum circuits, which are networks of classical quantum gates controlled by classical parameters. These circuits can be used as a parameterized representation of functions as called quantum neural networks, which can be applied to classical supervised learning models [3], or to construct generative models. Quantum generative models can be used to characterize unknown physical processes, or quantum algorithms. IBM Google Quantum Generative Adversarial Networks [5] mimic the adversarial structure of classical GANs [4] to train a quan- tum circuit to generate an unknown quantum state. Formulating classical GANs on the Wasserstein distance [1] has been observed to improve their training We improve the training of quantum GANs by formulating them based on a quantum divergence based on the Wasserstein Distance. Quantum vs Classical Distributions - Quantum distributions over a space Γ are described by an density operator that is a positive semi-definite matrix over a space X = C Γ with trace 1. - Generalization of a classical random process: Classical dis- tributions are characterized by diagonal density operators. - A quantum distribution over a system with components X , Y is described by a quantum operator in the Kronecker product space X⊗Y . The marginal distributions are obtained via the partial trace Tr X , Tr Y . - Different quantum distributions can provide the same dis- tribution of classical outcomes. eg. the uniform classical distribution 1 2 10 01 and the quantum state 1 2 11 11 . Wasserstein Distance - Classical Wasserstein Distance: The minimum cost of transporting a probability distribution to another given the cost c(x, y ) of transporting unit mass between given points. - The transport plan is encoded as a joint distribution with it’s marginals as the two input distributions. Given input distributions P, Q, W (P, Q) = min π Z π (x, y )dxdy, s.t. Z π (x, y )dy = P (x), Z π (x, y )dx = Q(y ), π (x, y ) 0 - Can be reformulated in matrix form as W (P, Q) = min π Tr(πC ) s.t. π Pos(X⊗Y ), Tr X (π ) = diag(Q), Tr Y (π ) = diag(P ) where C = diag(c(x, y )). - Advantages over other distances: If an input distributions is generated by applying a continuous parameterized func- tion, the Wasserstein distance is continuous in the param- eters, unlike measures such as the KL-divergence and JS- divergence. - Regularization using the relative entropy [6] between π and P Q makes the optimization problem strongly convex and differentiable in the generator parameters. - Wasserstein GANs have been formulated using the Wasser- stein distance with cost x - y 1 . Quantum Wasserstein Semi-metric - Motivation: Define a divergence between quantum states resembling the Wasserstein distance. -To allow for quantum states, relax the requirement that P, Q be diagonal, qW(P, Q) = max π Tr(πC ) s.t., π Pos(X⊗Y ), Tr X (π )= Q, Tr Y (π )= P - An arbitrarily chosen cost matrix C will not preserve qW(P, P ) = 0 for all quantum states. In particular no classical cost (diagonal C ) can work. - To ensure that qW (P, P ) = 0 for all quantum states, C fixed to be 1 2 (I X ⊗Y - SWAP), where the SWAP operator swaps the subsystems X , Y of a composite system (leaves P P unchanged). - Symmetric and positive, but does not satisfy the triangle inequality. - The quantum relative entropy λ Tr(π log(π ) - π log(P Q)) is added to the primal which makes the optimization problem strongly convex, and a differentiable function of the input distributions. Quantum Wasserstein GAN - The adversarial optimization problem for a GAN is for- mulated using the dual form of the semi-metric with an entropic regularizer to remove the hard constraints. min G max φ,ψ E real [ ψ ] - E fake [ φ] - E realfake [ ξ R ] - The fake state is generated using a parameterized quantum circuit G. We use a well-studied model for parameterized circuits [3]. - The discriminator φ, ψ and regularizer ξ R are quantities measured on the real and fake states. These measurements can also be implemented using quantum circuits or as a linear combination of simpler measurements. Structure of quantum Wasserstein GAN - The system is scalable: the loss function and gradients can be evaluated efficiently on a quantum computer (using basic gates polynomial in the number of parameters.) Numerical Evaluation 2 qubit pure state 4 qubit pure state 8 qubit pure state 2 qubit mixed state 3 qubit mixed state Learning with Gaussian noise Learning unknown quantum states (pure and mixed). Parameters are randomly initialized from a normal distribution. The quality of learning is measured by the fidelity (1 indicates perfect learning.) Compressing Quantum Circuits The QWGAN can be used to approximate complex quan- tum circuits with smaller circuits. A smaller circuit is trained to approximate the Choi state which encodes the action of a quantum circuit. Closeness in the Choi states of two circuits results in the closeness of the outputs of the circuit on average. We show that the quantum Hamiltonian simulation circuit for the 1-d 3-qubit Heisenberg model in [2] can be approximated by a circuit of 52 gates with an av- erage output fidelity over 0.9999 and a worst-case error 0.15, while the best-known circuit based on the product formula will need 11900 gates with a worst-case error 0.001. References [1] Martin Arjovsky, Soumith Chintala, and Léon Bottou, Wasserstein generative adversarial networks, Proceedings of the 34th International Conference on Machine Learning, pp. 214–223, 2017, arXiv:1701.07875. [2] Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J Ross, and Yuan Su, Toward the first quantum simulation with quantum speedup, Proceedings of the National Academy of Sciences 115 (2018), no. 38, 9456–9461, arXiv:1711.10980. [3] Edward Farhi and Hartmut Neven, Classification with quantum neural networks on near term processors, 2018, arXiv:1802.06002. [4]Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems 27, pp. 2672–2680, 2014, arXiv:1406.2661. [5] Seth Lloyd and Christian Weedbrook, Quantum generative adversarial learning, Physical Review Letters 121 (2018), 040502, arXiv:1804.09139. [6] Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, and Jason D. Lee, On the convergence and robustness of training GANs with regularized optimal transport, Advances in Neural Information Processing Systems 31, pp. 7091–7101, Curran Associates, Inc., 2018, arXiv:1802.08249.

Transcript of Quantum Wasserstein GANs - University Of Marylandxwu/papers/neurips_2019_poster.pdf ·...

Page 1: Quantum Wasserstein GANs - University Of Marylandxwu/papers/neurips_2019_poster.pdf · 2019-11-25 · Title: Quantum Wasserstein GANs Author: Shouvanik Chakrabarti*, Yiming Huang*,

Quantum Wasserstein GANsShouvanik Chakrabarti∗, Yiming Huang∗, Tongyang Li, Soheil Feizi, and Xiaodi Wu

Department of Computer Science, Institute for Advanced Computer Studies, andJoint Center for Quantum Information and Computer Science, University of Maryland

J o i n t C e n t e r f o rQuantum Information and Computer Science

arXiv:1911.00111NeurIPS 2019

Abstract

We propose the first design of quantumWasserstein Gen-erative Adversarial Networks (WGANs), which has beenshown to improve the robustness and the scalability ofthe adversarial training of quantum generative modelseven on noisy quantum hardware. Specifically, we pro-pose a definition of the Wasserstein semimetric betweendistributions over quantum data. We also demonstratehow to turn the quantum Wasserstein semimetric intoa concrete design of quantum WGANs that can be effi-ciently implemented on quantum machines. We use ourquantum WGAN to generate a 3-qubit quantum circuitof 50 gates that well approximates a 3-qubit 1-d Hamil-tonian simulation circuit that requires over 10k gates us-ing standard techniques.

Motivation

The first quantum computers will be small scale and noisysystems. Their practical applications will include the train-ing parameterized quantum circuits, which are networks ofclassical quantum gates controlled by classical parameters.

These circuits can be used as a parameterized representationof functions as called quantum neural networks, which canbe applied to classical supervised learning models [3], or toconstruct generative models. Quantum generative modelscan be used to characterize unknown physical processes, orquantum algorithms.

IBM Google

Quantum Generative Adversarial Networks [5] mimic theadversarial structure of classical GANs [4] to train a quan-tum circuit to generate an unknown quantum state.

Formulating classical GANs on the Wasserstein distance [1]has been observed to improve their training We improve thetraining of quantum GANs by formulating them based on aquantum divergence based on the Wasserstein Distance.

Quantum vs Classical Distributions

-Quantum distributions over a space Γ are described by andensity operator that is a positive semi-definite matrixover a space X ∼= CΓ with trace 1.

- Generalization of a classical random process: Classical dis-tributions are characterized by diagonal density operators.

- A quantum distribution over a system with componentsX ,Y is described by a quantum operator in the Kroneckerproduct space X ⊗ Y . The marginal distributions areobtained via the partial trace TrX ,TrY.

- Different quantum distributions can provide the same dis-tribution of classical outcomes. eg. the uniform classical

distribution 12

1 00 1

and the quantum state 12

1 11 1

.Wasserstein Distance

-Classical Wasserstein Distance: The minimum cost oftransporting a probability distribution to another giventhe cost c(x, y) of transporting unit mass between givenpoints.

- The transport plan is encoded as a joint distribution withit’s marginals as the two input distributions. Given inputdistributions P,Q,W (P,Q) = min

π

∫π(x, y)dxdy,

s.t.∫π(x, y)dy = P (x),

∫π(x, y)dx = Q(y),

π(x, y) ≥ 0- Can be reformulated in matrix form asW (P,Q) = min

πTr(πC)

s.t. π ∈ Pos(X ⊗ Y),TrX (π) = diag(Q),TrY(π) = diag(P )

where C = diag(c(x, y)).-Advantages over other distances: If an input distributionsis generated by applying a continuous parameterized func-tion, the Wasserstein distance is continuous in the param-eters, unlike measures such as the KL-divergence and JS-divergence.

- Regularization using the relative entropy [6] between π andP ⊗ Q makes the optimization problem strongly convexand differentiable in the generator parameters.

-Wasserstein GANs have been formulated using the Wasser-stein distance with cost ‖x− y‖1.

Quantum Wasserstein Semi-metric

-Motivation: Define a divergence between quantum statesresembling the Wasserstein distance.

- To allow for quantum states, relax the requirement thatP,Q be diagonal,qW(P,Q) = max

πTr(πC)

s.t., π ∈ Pos(X ⊗ Y),TrX (π) = Q,TrY(π) = P

- An arbitrarily chosen cost matrix C will not preserveqW(P, P ) = 0 for all quantum states. In particular noclassical cost (diagonal C) can work.

- To ensure that qW (P, P ) = 0 for all quantum states, Cfixed to be 1

2(IX⊗Y − SWAP), where the SWAP operatorswaps the subsystems X ,Y of a composite system (leavesP ⊗ P unchanged).

- Symmetric and positive, but does not satisfy the triangleinequality.

- The quantum relative entropy λTr(π log(π)− π log(P ⊗Q)) is added to the primal which makes the optimizationproblem strongly convex, and a differentiable function ofthe input distributions.

Quantum Wasserstein GAN

- The adversarial optimization problem for a GAN is for-mulated using the dual form of the semi-metric with anentropic regularizer to remove the hard constraints.

minG

maxφ,ψ

Ereal[ψ]− Efake[φ]− Ereal⊗fake[ξR]

- The fake state is generated using a parameterized quantumcircuit G. We use a well-studied model for parameterizedcircuits [3].

- The discriminator φ, ψ and regularizer ξR are quantitiesmeasured on the real and fake states. These measurementscan also be implemented using quantum circuits or as alinear combination of simpler measurements.

Structure of quantum Wasserstein GAN

- The system is scalable: the loss function and gradientscan be evaluated efficiently on a quantum computer (usingbasic gates polynomial in the number of parameters.)

Numerical Evaluation

2 qubit pure state 4 qubit pure state

8 qubit pure state 2 qubit mixed state

3 qubit mixed state Learning with Gaussian noise

Learning unknown quantum states (pure and mixed). Parameters arerandomly initialized from a normal distribution. The quality of learningis measured by the fidelity (1 indicates perfect learning.)

Compressing Quantum Circuits

The QWGAN can be used to approximate complex quan-tum circuits with smaller circuits. A smaller circuit istrained to approximate the Choi state which encodes theaction of a quantum circuit. Closeness in the Choi statesof two circuits results in the closeness of the outputs of thecircuit on average. We show that the quantum Hamiltoniansimulation circuit for the 1-d 3-qubit Heisenberg model in[2] can be approximated by a circuit of 52 gates with an av-erage output fidelity over 0.9999 and a worst-case error 0.15,while the best-known circuit based on the product formulawill need ∼ 11900 gates with a worst-case error 0.001.

References

[1] Martin Arjovsky, Soumith Chintala, and Léon Bottou, Wasserstein generative adversarialnetworks, Proceedings of the 34th International Conference on Machine Learning, pp. 214–223,2017, arXiv:1701.07875.

[2] Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J Ross, and Yuan Su, Toward the firstquantum simulation with quantum speedup, Proceedings of the National Academy of Sciences115 (2018), no. 38, 9456–9461, arXiv:1711.10980.

[3] Edward Farhi and Hartmut Neven, Classification with quantum neural networks on near termprocessors, 2018, arXiv:1802.06002.

[4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,Aaron Courville, and Yoshua Bengio, Generative adversarial nets, Advances in NeuralInformation Processing Systems 27, pp. 2672–2680, 2014, arXiv:1406.2661.

[5] Seth Lloyd and Christian Weedbrook, Quantum generative adversarial learning, PhysicalReview Letters 121 (2018), 040502, arXiv:1804.09139.

[6] Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, and Jason D. Lee, On the convergence androbustness of training GANs with regularized optimal transport, Advances in NeuralInformation Processing Systems 31, pp. 7091–7101, Curran Associates, Inc., 2018,arXiv:1802.08249.