Quantum-classical reinforcement learning for decoding [email protected] June-Koo Kevin Rhee...

11
Quantum Machine Intelligence (2020) 2:0 https://doi.org/10.1007/s42484-020-00019-5 RESEARCH ARTICLE Quantum-classical reinforcement learning for decoding noisy classical parity information Daniel K. Park 1 · Jonghun Park 1 · June-Koo Kevin Rhee 1 Received: 1 October 2019 / Accepted: 12 May 2020 © Springer Nature Switzerland AG 2020 Abstract Learning a hidden parity function from noisy data, known as learning parity with noise (LPN), is an example of intelligent behavior that aims to generalize a concept based on noisy examples. The solution to LPN immediately leads to decoding a random binary linear code in the presence of classification noise. This problem is thought to be intractable classically, but can be solved efficiently if a quantum oracle can be queried. However, in practice, a learner is likely to obtain classical data. In this work, we show that a naive application of the quantum LPN algorithm to classical data encoded in an equal superposition state requires an exponential sample complexity. We then propose a quantum-classical reinforcement learning algorithm to solve the LPN problem for classical data and demonstrate a significant reduction in the sample complexity compared with the naive approach. Simulations with a hidden bit string of length up to 12 show that the quantum-classical reinforcement learning performs better than known classical algorithms when the sample complexity, run time, and robustness to classical noise are collectively considered. Our algorithm is robust to any noise in the quantum circuit that effectively appears as Pauli errors on the final state. Keywords Machine learning · Learning parity with noise · Quantum-classical hybrid algorithm · Reinforcement learning 1 Introduction Recent discoveries of quantum algorithms for machine learning and artificial intelligence have gained much attention, and stimulated further exploration of quantum technologies for applications in complex data analysis. A type of machine learning considered in this work is related to the ability to construct a general concept based on examples that contain errors. This task can be formulated in the probably approximately correct (PAC) framework (Valiant 1984) in which a learner constructs a hypothesis h with high probability based on a training set of input–output pairs such that h(x) agrees with f(x) on a large fraction of the inputs. In this context, important metrics for characterizing the learnability are the sample complexity and the time complexity that correspond to the Daniel K. Park [email protected] June-Koo Kevin Rhee [email protected] 1 School of Electrical Engineering and ITRC of Quantum Computing for AI, KAIST, 34141, Daejeon, South Korea minimum number of examples required to reach the goal and the run time of the algorithm, respectively. A famous example of such tasks is the problem of learning a hidden Boolean function that outputs a binary inner product of an input and a hidden bit string of length n by making queries to an oracle that draws an input uniformly at random. This problem can also be tackled by making queries to a quantum oracle that produces all possible input–output pairs in an equal superposition state. Learning the parity function from a noiseless oracle is easy in both classical and quantum cases. When the outcomes from an oracle are altered by noise, learning from a classical oracle becomes intractable while the quantum version remains to be efficient (Cross et al. 2015). This problem is also known as learning parity with noise (LPN). The LPN problem is equivalent to decoding a random linear code in the presence of noise (Lyubashevsky 2005), and several cryptographic applications have been suggested based on the hardness of this problem and its generalizations (Regev 2005; Pietrzak 2012). Furthermore, the robustness of the quantum learning against noise opens up possibilities for achieving a quantum advantage with near-term quantum devices without relying on quantum error correction. However, the existence of a quantum oracle for solving a specific problem is highly hypothetical. In practice, a learner often has to learn from a classical data set.

Transcript of Quantum-classical reinforcement learning for decoding [email protected] June-Koo Kevin Rhee...

Page 1: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 https://doi.org/10.1007/s42484-020-00019-5

RESEARCH ARTICLE

Quantum-classical reinforcement learning for decoding noisyclassical parity information

Daniel K. Park1 · Jonghun Park1 · June-Koo Kevin Rhee1

Received: 1 October 2019 / Accepted: 12 May 2020© Springer Nature Switzerland AG 2020

AbstractLearning a hidden parity function from noisy data, known as learning parity with noise (LPN), is an example of intelligentbehavior that aims to generalize a concept based on noisy examples. The solution to LPN immediately leads to decoding arandom binary linear code in the presence of classification noise. This problem is thought to be intractable classically, but canbe solved efficiently if a quantum oracle can be queried. However, in practice, a learner is likely to obtain classical data. Inthis work, we show that a naive application of the quantum LPN algorithm to classical data encoded in an equal superpositionstate requires an exponential sample complexity. We then propose a quantum-classical reinforcement learning algorithm tosolve the LPN problem for classical data and demonstrate a significant reduction in the sample complexity compared withthe naive approach. Simulations with a hidden bit string of length up to 12 show that the quantum-classical reinforcementlearning performs better than known classical algorithms when the sample complexity, run time, and robustness to classicalnoise are collectively considered. Our algorithm is robust to any noise in the quantum circuit that effectively appears as Paulierrors on the final state.

Keywords Machine learning · Learning parity with noise · Quantum-classical hybrid algorithm · Reinforcement learning

1 Introduction

Recent discoveries of quantum algorithms for machinelearning and artificial intelligence have gained muchattention, and stimulated further exploration of quantumtechnologies for applications in complex data analysis.A type of machine learning considered in this workis related to the ability to construct a general conceptbased on examples that contain errors. This task can beformulated in the probably approximately correct (PAC)framework (Valiant 1984) in which a learner constructs ahypothesis h with high probability based on a training setof input–output pairs such that h(x) agrees with f (x) ona large fraction of the inputs. In this context, importantmetrics for characterizing the learnability are the samplecomplexity and the time complexity that correspond to the

� Daniel K. [email protected]

June-Koo Kevin [email protected]

1 School of Electrical Engineering and ITRC of QuantumComputing for AI, KAIST, 34141, Daejeon, South Korea

minimum number of examples required to reach the goaland the run time of the algorithm, respectively.

A famous example of such tasks is the problem oflearning a hidden Boolean function that outputs a binaryinner product of an input and a hidden bit string of lengthn by making queries to an oracle that draws an inputuniformly at random. This problem can also be tackledby making queries to a quantum oracle that producesall possible input–output pairs in an equal superpositionstate. Learning the parity function from a noiseless oracleis easy in both classical and quantum cases. When theoutcomes from an oracle are altered by noise, learningfrom a classical oracle becomes intractable while thequantum version remains to be efficient (Cross et al. 2015).This problem is also known as learning parity with noise(LPN). The LPN problem is equivalent to decoding arandom linear code in the presence of noise (Lyubashevsky2005), and several cryptographic applications have beensuggested based on the hardness of this problem and itsgeneralizations (Regev 2005; Pietrzak 2012). Furthermore,the robustness of the quantum learning against noise opensup possibilities for achieving a quantum advantage withnear-term quantum devices without relying on quantumerror correction. However, the existence of a quantum oraclefor solving a specific problem is highly hypothetical. Inpractice, a learner often has to learn from a classical data set.

Page 2: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

0 Page 2 of 11 QuantumMachine Intelligence (2020) 2:0

The ability to exhibit the advantage of quantum learning,especially when training examples are classical, remains aninteresting and important open problem.

In this work, we show that a naive application ofthe quantum LPN algorithm to classical data requires anexponential amount of examples (i.e., training samples)or computing resources, thereby nullifying the quantumadvantage. We then propose a quantum-classical hybridalgorithm based on the reinforcement learning frameworkfor solving the LPN problem in the absence of the quantumoracle. The proposed algorithm uses noisy classical samplesto prepare an input quantum state that is compatiblewith the original quantum LPN algorithm. Based on theoutcome of the quantum algorithm, a reward is classicallyevaluated and an action is chosen by a greedy algorithmto update the quantum state in the next learning cycle.Numerical calculations show that the required number ofsamples and run time can be significantly reduced by thereinforcement learning algorithm compared with the naiveapproach. Furthermore, simulations show that in the regimeof small n, the quantum-classical hybrid algorithm performscomparably with or better than the classical algorithmthat performs the best in this regime in terms of thesample complexity. Simulation results also suggest thatour algorithm is more robust to noise than the classicalalgorithms. Our algorithm is expected to improve the runtime of the classical algorithms, provided that an efficientmeans to update a quantum state with classical data, suchas quantum random access memory, exists. Another notablefeature of our algorithm is the robustness to noise thateffectively appears as depolarizing errors on the final stateof the quantum circuit before measurement.

Our algorithm also fits within the framework of vari-ational quantum algorithms, a classical-quantum hybridapproach that has been developed as a promising avenueto demonstrate the quantum advantage with noisy interme-diate scale quantum (NISQ) devices. Variational quantumalgorithms have been used with success to find near-optimalsolutions for various important problems in quantum chem-istry (Peruzzo et al. 2014; McClean et al. 2016; Romeroet al. 2017; Kandala et al. 2017; Moll et al. 2018), quan-tum machine learning (Otterbach et al. 2017; Mitarai et al.2018; Schuld et al. 2018; Dallaire-Demers and Killoran2018; Fosel et al. 2018; Zoufal et al. 2019; Romero andAspuru-Guzik 2019), and quantum control (Li et al. 2017).A unique challenge of the problem considered in this workis that the algorithm must find the exact and unique solu-tion, i.e., the hidden bit string, with high probability. Thus,our work serves as an intriguing example that utilizes theconcept of variational method for finding the exact solutionof a problem.

The remainder of the paper is organized as follows.Section 2 reviews LPN. Section 3 shows that in the absence

Fig. 1 A pictorial representation of the LPN problem

of the quantum oracle, the naive application of the quan-tum algorithm to classical data results in an exponentialcomplexity. In Section 4, we present a quantum-classicalhybrid algorithm based on reinforcement learning for solv-ing the LPN problem. Numerical calculations in Section 4.2demonstrate that both sample and time complexities ofthe hybrid algorithm are significantly reduced comparedwith the native application of the quantum LPN algorithm.Section 4.3 compares the performance of our algorithm andknown classical algorithms via simulations. Section 5 dis-cusses the resilience to depolarizing errors on the final state,and Section 6 concludes.

2 Learning parity with noise

The goal of the parity learning problem is to find a hiddenn-bit string s ∈ {0, 1}n by making queries to an exampleoracle that returns a training data pair that consists of auniformly random input x ∈ {0, 1}n and an output of aBoolean function:

f (x) = x · s mod 2. (1)

A noisy oracle outputs (x, f (x) ⊕ e), where e ∈ {0, 1}has the Bernoulli distribution with parameter η, i.e.,

, and η < 1/2 (Angluin and Laird 1988;Blum et al. 2003; Lyubashevsky 2005) (Fig. 1).

In the quantum LPN algorithm introduced in Crosset al. (2015), a quantum oracle implements a unitarytransformation on the computational basis states and returnsthe equal superposition of |x〉|f (x)〉 for all possible inputsx. By applying Hadamard gates to all qubits at the queryoutput, the learner acquires an entangled state:

1√2

(|0〉⊗n|0〉 + |s〉|1〉) . (2)

Thus, whenever the label (last) qubit is 1 (occurswith probability 1/2), measuring data (first n) qubitsin their computational bases reveals s. Note that thisalgorithm is very similar to the Bernstein-Vazirani (BV)algorithm (Bernstein and Vazirani 1997), except that in theBV problem the learner can choose an example in eachquery and the input state of the label qubit is preparedin |−〉. In the quantum case, since all example data arequeried in superposition, the ability to choose an example isirrelevant. On the other hand, the quantum LPN algorithmrequires the extra post-selection step since the input ofthe label qubit is prepared in |0〉. A noisy quantum

Page 3: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 Page 3 of 11 0

Fig. 2 The quantum circuit for learning parity with noise introducedin Cross et al. (2015). Hadamard operations (H ) prepare the equalsuperposition of all possible input states. The dotted box represents thequantum oracle that encodes the hidden parity function, and is realizedusing controlled-NOT gates between the query register (control) andlabel (target) qubits. The hidden bit string in this example is s =101 . . . 0. Before measurement, all qubits experience independentdepolarizing noise denoted byDη with a noise rate η < 1/2

oracle can be modeled with the local depolarizing channelacting independently on all

qubits at the oracle’s output with a known constant noiserate of η < 1/2. The quantum circuit for solving theLPN algorithm is depicted in Fig. 2. In this example, s is101 . . . 0, and it is encoded via a series of controlled-NOT(c-NOT) gates targeting the label qubit controlled by the dataqubits. The shaded area in the figure represents the quantumoracle whose structure is hidden from the learner.

Learning the hidden parity function from noiselessexamples is efficient for both classical and quantum oracles.However, in the presence of noise, the best-known classicalalgorithms have superpolynomial complexities (Angluinand Laird 1988; Blum et al. 2003; Lyubashevsky 2005;Levieil and Fouque 2006), while the quantum learningbased on the bit-wise majority vote remains efficient (Crosset al. 2015). The query and time complexities of the LPNproblem for classical and quantum oracles are summarizedin Table 1.

The advantage of having a quantum oracle for solvingan LPN problem was demonstrated experimentally withsuperconducting qubits in Riste et al. (2017). Furthermore,a quantum advantage can be demonstrated even whenall query register qubits are fully depolarized by usingdeterministic quantum computation with one qubit (Knilland Laflamme 1998; Park et al. 2018).

In the following sections, we discuss quantum techniquesto solve the LPN problem in the absence of the afore-mentioned quantum oracle, using classical samples that aregiven uniformly at random and independently. The generalstrategy considered in this work is to prepare a specificquantum state based on M classical noisy training samples,and apply the measurement scheme developed in Crosset al. (2015). The measurement outcome of the query qubitsin the computational basis, sM , is used to construct a hypoth-esis function. The goal of our algorithms is to minimize M

with which the probability to guess the correct hidden bitstring is greater than 2/3, i.e.:

(3)

Then by repeating the algorithm a constant number oftimes and taking a majority vote, s can be found withhigh probability. Note that this is not a strictly necessarycondition as the majority vote can find the correct answerefficiently with high probability for γ ≥ 1/2 + 1/δ as longas δ is at most poly(n), since the algorithm is to be repeatedO(δ2) times. However, Eq. (3) is a sufficient condition tosolve the problem.

3 Naive application of quantum algorithm toclassical data

3.1 Learning from a sparse set of training samples

Given a set of examples (xj , f′(xj )) where j =

1, . . . ,M < 2n, xj is drawn uniformly at random andindependently, and f ′(xj ) = f (xj ) ⊕ ej is the noisy outputas described before with the error rate being η < 1/2 forall j , a naive way to apply the quantum LPN algorithm is tocreate a quantum state:

|Ψ 〉 = 1√M

M∑

j=1

|xj 〉|f ′(xj )〉, (4)

and treat it as the output of the quantum oracle. Then, as inthe quantum LPN algorithm, single-qubit Hadamard gatesare applied to all qubits and the label qubit is measuredin the computational basis (see Fig. 2). The measurement

Table 1 Summary of the query (or sample) and time complexities of various LPN algorithms reported in previous works

Reference Oracle Queries (samples) Time

Angluin and Laird (AL) (Angluin and Laird 1988) Classical O(n) O (2n)

Blum, Kalai and Wasserman (BKW) (Blum et al. 2003) Classical 2O

(n

log n

)

2O

(n

log n

)

Lyubashevsky (L) (Lyubashevsky 2005) Classical O(n1+ε

)2O

(n

log log n

)

Cross, Smith, and Smolin (CSS) (Cross et al. 2015) Quantum O (log n) O(n)

Here, n is the length of the hidden bit string s, i.e., s ∈ {0, 1}n and ε > 0 is some constant

Page 4: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

0 Page 4 of 11 QuantumMachine Intelligence (2020) 2:0

outcome of 1, which occurs with the probability of 1/2, ispost-selected to leave the query register qubits in:

|ψ1〉 = 1√M2n

M∑

j=1

⎣(−1)ej |s〉 +∑

y =0n

(−1)xj ·y⊕ej |s ⊕ y〉⎤

⎦ .

(5)

From the above state, the probability to guess s correctly is:

(6)

which is exponentially small in n. However, this resultalso implies that even when the quantum oracle outputsonly a fraction of all possible examples as an equalsuperposition state, and the noise does not act coherentlyon all xj as in Cross et al. (2015) and Riste et al.(2017), the LPN problem can still be solved. As aside note, since xj is drawn uniformly at random,

and

. Thus, the probability tomeasure an incorrect bit string s ⊕ y (see the secondterm in Eq. 5) depends on the error distribution thatdetermines ej . For instance, the worst case occurs whenthe error bits ej coincide with xj · y ∀j . In thiscase, the probability to obtain the incorrect answer is

. Thus, in theworst case scenario, the naive application of the quantumalgorithm to a limited number of classical samples cannotsolve the LPN problem.

3.2 Data span

In order to reduce the number of queries, linearlyindependent data can be added to generate an artificial data,which can be used in creating the state in the form of Eq. 4.However, this not only requires the classical pre-processing,but also can increase the error probability of the generatedparity bit. For example, from two data pairs, (x1, f

′(x1))and (x2, f

′(x2)) with linearly independent inputs, a newdata (x3 = x1 ⊕ x2, f (x3) = f ′(x1) ⊕ f ′(x2)) can becreated. Since f ′(x1) ⊕ f ′(x2) = (x1 ⊕ x2) · s ⊕ (e1 ⊕e2) mod 2, the error probability of the artificial parity bit is:

(7)

The above equation can be generalized for a new datagenerated from d linearly independent data as follows:

(8)

For d > 1 and η > 1/2,(1 − (1 − 2η)d

)/2 > η. Therefore,

data span always increases the error rate in addition toincreasing the time complexity for pre-processing the data.However, when η is sufficiently small so that η′ also remainsreasonably small, one may consider using the data span trickin order to reduce the sample complexity.

3.3 Generation of artificial data with a parity guessfunction

The next strategy we consider is to generate missing databy guessing the parity function and design an iterativealgorithm that aims to improve the accuracy of the guess.In this approach, the rate of accuracy improvement withrespect to the number of queries determines the efficiencyof the algorithm.

A brief description of the iterative LPN (I-LPN)algorithm is as follows. First, a full data set of 2n examplesis encoded as an equal quantum superposition state usingM real data and 2n − M artificial data generated by aparity guess function. The quantum state can be prepared byguessing the quantum oracle of the quantum LPN algorithm,and inserting the output state of the oracle as an inputto quantum random access memory (QRAM) (Giovannettiet al. 2008a, b; Hong et al. 2012; Arunachalam et al. 2015;Park et al. 2019) to update its entries according to realdata. The circuit-based QRAM introduced in Ref. (Parket al. 2019) can use flip-register-flop processes to updatean output of a guessed quantum oracle with real data usingthe number of steps that increases at least linearly with thenumber of samples. More explicitly, given a guessed paritystring s, the quantum LPN algorithm can create the state∑2n

j=1 |xj 〉|g(xj )〉, where g(xj ) = xj · s mod 2. Then, thisstate becomes an input to the QRAM circuit introduced inPark et al. (2019), for which |xj 〉 is encoded in the busqubits and the guessed parity bit is encoded in the registerqubit. The controlled-register process in Park et al. (2019)is applied to flip the register qubit if real data disagrees withthe guessed parity, i.e., if g(xj ) = f ′(xj ). After the statepreparation, the usual quantum LPN protocol that consistsof applying Hadamard gates, projective measurements, andthe post-selection outputs an n bit string in the registerqubits. This string is used to construct a new parity guessfunction in the next iteration for which a new sample isalso acquired. The learner can also repeat the measurement

Page 5: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 Page 5 of 11 0

procedure for guessing a new parity function withoutquerying a new sample. This iteration is referred to as epoch.A more detailed description of the algorithm is given inAlgorithm 1.

Now, we analyze the performance of I-LPN. In theiterative algorithm, the time complexity is dominatedby the state preparation step. Since the quantum oracleimplementation and the QRAM process given M trainingsamples requires O(n) and O(M) run times, respectively,and usually M > n, the run time is linearly related tothe sample complexity. Hence, we focus on the estimationof the sample complexity. The I-LPN algorithm usesaforementioned procedure to prepare a quantum state:

|Ψ 〉 = 1√2n

2n∑

j=1

|xj 〉|h(xj )〉, (9)

where

h(xj ) ={

f ′(xj ) = xj · s ⊕ ej mod 2 if j ≤ M,

g(xj ) = xj · sM−1 mod 2 if j > M,(10)

and sM−1 is the parity guess function from the previousround. Now, let εj be a Bernoulli random variable thatis 0 if h(xj ) = f (xj ) and 1 otherwise. In other words,the weight of a string defined as ε := ε1ε2 . . . ε2n ,denoted by w(ε), is the number of different bits betweenh1:2n and f1:2n , where �i:j denotes a binary string

�(xi)�(xi+1) . . .�(xj−1)�(xj ). With this definition, wecan write:

h(xj ) = xj · s ⊕ εj , (11)

where

εj ={

ej if j ≤ M,

xj · (s ⊕ sM−1) if j > M .(12)

The post-selected state is:

|ψ1〉 = 1

2n

2n∑

j

⎝(−1)εj |s〉 +∑

y =0n

(−1)xj ·y⊕εj |s ⊕ y〉⎞

⎠ ,

(13)

The probability to obtain s from the projective measurementin the computational basis is:

(14)

where r(M) = w(ε)/2n is the error probability inestimating f from h given M noisy data. From Eq. 12, onecan see that if sM−1 = s, then εj = 0 ∀j > M , and onlythe errors in the real samples yield non-zero values in ε.However, if sM−1 = s, then since xj is chosen uniformly atrandom, εj = 0 for 1/2 of the set of input xj for j > M onaverage. With this, the expectation value of the weight of ε

can be calculated as:

(15)

We start the algorithm with an initial sample (x1, f (x1)),and the initial guess s0 = 0n. Then, the initial errorprobability is:

r(1) = 1

2n

η

2n+

(1 − 1

2n

)η + (2n − 1)/2

2n. (16)

The first term takes into consideration that s0 = 0n is theright answer with the probability 1/2n. In this case, only thereal data can carry incorrect parity bits with a probability η.The second term indicates that when s0 is incorrect, 1/2 ofthe 2n − 1 guessed parity bits are wrong on average. Usingthe above equations, the error probability in the subsequentrounds up to M samples can be written recursively as:

(17)

For brevity, we denote η0 = Mη/2n and η1 =(Mη + (2n − M)/2) /2n throughout the manuscript. We

Page 6: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

0 Page 6 of 11 QuantumMachine Intelligence (2020) 2:0

plot the success probability as a function of the numberof samples for various values of n and η in Fig. 3. If oneincreases epoch, the fidelity curve simply converges fasterto the one for n → ∞. Thus, the number of samples neededto achieve desired success probability is exponential in n.

In the following section, we show that an introduction ofa simple policy for updating the parity guess function, asdone in reinforcement learning, significantly enhances thelearning performance.

4 Reinforcement learning

4.1 Greedy algorithm

To improve the performance of the iterative algorithmintroduced in the previous section, we use the conceptsof reinforcement learning, such as state, reward, policy,and action. The key addition to the previous iterativealgorithm is the use of a greedy algorithm, whichalways exploits current knowledge to maximize immediatereward, as the policy to make an action. We refer tothis algorithm as reinforcement-learning parity with noise(R-LPN).

The underlying idea of R-LPN can be described asfollows. The state in each iteration is the guessed bit stringafter performing the usual quantum LPN algorithm. Thereward is determined by the Hamming distance betweenthe parity bits generated by the guess and the parity bitsof the real data. At Mth query, the learner obtains M

guessed bit strings as well as M reward values. The greedyalgorithm then selects the guessed bit string that maximizesthe reward, and uses it to construct the guessed quantumoracle of the next iteration. Our algorithm can be viewedas a variational quantum algorithm as the guessed quantum

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

n=5, =0n=10, =0n=25, =0n=25, =0.05n=25, =0.12/3

n increases

Fig. 3 Success probability of the iterative LPN algorithm with respectto the number of samples in which the parity guess function is obtainedfrom the quantum LPN circuit and used in the subsequent round

oracle can be parameterized with controlled-not gates andis updated in each iteration. The detailed description ofthe R-LPN algorithm is provided below, and a schematicrepresentation of the algorithm is shown in Fig. 4.

4.2 Numerical analysis

We analyze the performance of R-LPN by numericallycalculating the error probability, the probability to measuresM = s in the round with M samples, similar to therecursive calculation shown in Section 3.3. The algorithmuses parity guess functions from all measurements upto the present round, i.e., s1, . . . , sM . To construct therecursive formula, we consider two situations. First, theset of parity guess functions does not contain the answer,i.e., s /∈ {s1, . . . , sM}. This occurs with the probabilitypM = ∏M−1

j

(1 − (1 − 2r(j))2

), where r(j) is the error

probability at the j th round, and (1 − 2r(j))2 correspondsto the probability to obtain s (see Eq. 14). When the parity

Page 7: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 Page 7 of 11 0

Fig. 4 Schematic of thequantum-classical hybridalgorithm for solving thelearning parity with noiseproblem, explained by using theterminologies in reinforcementlearning

guess function is wrong, the probability to measure thewrong hidden bit string in the given round is η1 as explainedin the previous section.

When s ∈ {s1, . . . , sM}, we further consider twosituations. First, M parity bits in the training examplesare error free, which occurs with a probability of (1 −η)M . In this case, given M linearly independent examples,which can be produced with the probability

∏M−1k=0 (1 −

2k−M) > 1/4 for any M > 1, there are �2n−M� choicesout of all possible parity guess functions sM ∈ {0, 1}nthat can generate the same parity bit string as f1:M . Notethat when xj is uniformly zero, then f (xj ) = 0 for anys. Thus, we exclude this example when calculating theHamming distance between the guessed parity bits and thetrue parity bits. We define cM = (�2n−M� − 1)/(2n − 1)as the probability to pick a wrong parity guess functionthat produces the same parity bits as s among 2n − 1possible bit strings. Since there are only M parity guessfunctions, the probability to obtain an incorrect parityfunction is actually less than cM . However, we use cM

to make a reasonable estimation. The incorrectly guessedparity function produces (2n − M)/2 errors in the artificialparity bits when averaged over uniformly random input.

If the true M-bit parity string, f ′1:M , contains errors, then

the probability to obtain a wrong parity guess function canbe estimated as βM = ∑�Mη�

k=1

(Mk

)cM . This means that for

simplicity, there are up to �Mη� (nearest integer to Mη)errors in the true parity bit string. Combining all casesconsidered above, the error probability—the probability toobtain an incorrect parity guess function in the round withM examples—can be estimated as:

r(M) = (1 − pM−1)[(1 − η)M cM

((2n − M)/2

)/2n

+(1 − (1 − η)M

)(η0(1 − βM) + η1βM)

]

+pM−1η1, (18)

where the initial error probability, r(1), is given in Eq. 16.In the R-LPN algorithm, the time complexity is again

dominated by state preparation, for which the number of

steps increases at least linearly with the number of samplesas mentioned in the previous section. The computationtime for calculating the Hamming distance between M

guessed parity bit strings and the actual parity bit stringis O(nM2). Since these computation times depend on M ,we focus on estimating the sample cost. Using the aboveequation, the number of samples required for achieving

, denoted by M2/3, can be calculatednumerically. Figure 5 shows M2/3 as a function of n forseveral values of the error probability, η = {0, 0.1, 0.2}. Foreach error rate, the number of epoch is given as 1, n, and n2.When n = 40, there are 240 ≈ 1012 possibilities for s. Buteven in the presence of a relatively high error probability of20%, having only about 104 examples suffices to solve theproblem. Figure 5 also suggests that the number of samplescan be further reduced by increasing epoch.

We compareM2/3 of I-LPN and R-LPN as a function of nfor several values of the error probability, η = {0, 0.1, 0.2},

5 10 15 20 25 30 35 40

102

104

106 epoch = 1, = 0epoch = 1, = 0.1epoch = 1, = 0.2epoch = n, = 0epoch = n, = 0.1epoch = n, = 0.2

epoch = n 2, = 0

epoch = n 2, = 0.1

epoch = n 2, = 0.2

Fig. 5 Number of samples required for achieving ,denoted by M2/3 as a function of the length of the hidden bit string,n, for various error rates, η = {0, 0.1, 0.2}. For each error rate, thenumber of epoch is also varied among 1, n, and n2. The numberof samples needed increases (decreases) as the error rate (number ofepoch) increases

Page 8: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

0 Page 8 of 11 QuantumMachine Intelligence (2020) 2:0

5 10 15 20 25 30100

102

104

106 = 0 = 0.1 = 0.2

Epoch = n2

Fig. 6 Ratio between the numbers of samples required for achievingin I-LPN, denoted by MI

2/3, and in R-LPN, denoted

by MR2/3 as a function of the length of the hidden bit string, n, for

various error rates, η = {0, 0.1, 0.2}. For all calculations, the numberof epoch is n2

in Fig. 6. In this comparison, the number of epoch is n2. Theresult shows that R-LPN reduces the sample complexity byseveral orders of magnitude for when n is only 15 or so,and this improvement continues to increase as n increases.When n is about 15 to 30, the curves qualitatively suggestthat R-LPN enhances the sample complexity exponentiallyin n. However, our analysis does not provide a definitiveconclusion about the rate of improvement in the asymptoticlimit.

4.3 Simulation

We use simulations in MATLAB to verify the performanceof the R-LPN algorithm, and to compare to known classicalmethods that are listed in Table 1. Each iteration startswith the quantum state of the form shown in step 7of Algorithm 2, using classical data that are generateduniformly at random. The simulation then proceeds byfollowing the subsequent steps in Algorithm 2. For a fixedvalue of s, all simulations are repeated 200 times to averageover the set of examples drawn uniformly at random.

4.3.1 Data filtering

All simulations used an additional pre-processing step,which we refer to as data filtering, as an optional attemptto filter out erroneous examples and improve the successprobability. Data filtering counts the number of occurrenceof example pairs, (xj , f

′(xj )), denoted by oj . Then, anexample with a label k that appears less than some fractionof maxj (oj ) times (i.e., ok < wmaxj (oj )/2) is discarded.An intuitive motivation behind this procedure is that sinceexamples are randomly drawn from a uniform distribution,

the same data can be drawn multiple times and erroneoussamples are less frequently queried than correct samples forη < 1/2. The optimal choice of the filtering coefficient,w, depends on the error rate. For example, when η = 0,such data filtering is not desired since all data are error free.However, in our simulations, we assume that η is unknown,

and we used w = 0.4√

dH

(gM1:M, f ′

1:M) + 0.8, which was

empirically found to perform well in overall for variousvalues of η. This choice means that the level of data filteringincreases as the number of samples, and hence the successprobability, increases.

4.3.2 Results

We first simulate an R-LPN algorithm with a slightmodification, which is intended to save the memory andtime cost for storing all M sets of guessed parity bits tocalculate their Hamming distances with respect to the realparity bits. Namely, only the guessed parity bits from theprevious iteration is kept in the memory, and the greedyalgorithm updates the parity guess function for the nextround by only comparing the rewards given by the presentand the previous guesses. We compare the performance ofthis modified algorithm to the originally proposed R-LPNby analyzing the success probability with respect to thenumber of samples via simulations. The simulation resultsare depicted in Fig. 7 for n = 6, 7, and 8, and show thatthe modified R-LPN does not introduce any considerablechange in the sample complexity, especially around theregion for . Since the modified R-LPN algorithm uses only the current state for makingan action in the subsequent round, the simulation resultsof this algorithm are denoted by Markov in the figurelegend.

Hereinafter, all simulations use the modified R-LPN,since it performs similarly to the original version in terms ofthe sample complexity while the memory and time cost forcalculating the rewards do not increase with M .

Figure 8 shows the simulation results of the numberof training samples required for succeeding various LPNalgorithms as a function of n for several values of theerror rate, η. The number of epoch is n2 in all simulationsin this figure. The simulation results show that the R-LPN algorithm performs better than the known classicalalgorithms, AL (Angluin and Laird 1988) and BKW (Blumet al. 2003) (see Table 1), in the regime of n ≤ 12.In this regime, the algorithm by Lyubashevsky (denotedby L) (Lyubashevsky 2005) consumes the least amount ofsamples among the classical methods. When η = 0, R-LPNappears to perform slightly worse than L. However, R-LPNbecomes advantageous for learning in the presence of thenoise, especially as n increases.

Page 9: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 Page 9 of 11 0

Fig. 7 The plots show the probability to find a hidden bit string s inR-LPN algorithms as a function of the normalized number of sam-ples, M/2n. Dotted lines represent the simulation results of the R-LPNalgorithm described in Algorithm 2, and solid lines represent the sim-ulation results of the modified R-LPN algorithm that keeps only the

parity guess function from the previous round, and labelled asMarkovin the legend. Simulations are performed for (a) η = 0, (b) η = 0.1,and (c) η = 0.2, and for n = 6, 7, and 8. The number of epoch is 30 inall simulations in this figure

As summarized in Table 1, the run time of L issubexponentially greater than its sample complexity. Therun time of BKW is comparable with its sample complexityand the run time of AL increases exponentially with respectto n. However, the run time of R-LPN is expected to becomparable with its sample complexity. Thus, we expect theR-LPN algorithm to provide faster learning compared withthe classical algorithms. It is important to note that althoughthe sample complexity of the AL algorithm is O(n) (seeTable 1), and is hence expected to perform the best in termsof the sample complexity for large n, the AL algorithm isnot necessarily more favorable than other algorithms in thelimit of large n due to its exponential run time.

The R-LPN algorithm is also more resilient to noise asdemonstrated in Fig. 9. The simulation results show that thenumber of samples needed for succeeding aforementionedLPN algorithms as a function of the error rate, η, for variousn. The number of epoch in all simulations in this figure is n2.

The R-LPN algorithm requires less number of samples thanAL and BKW in all instances in simulations. R-LPN and Lalgorithms perform similarly for small n, but one can seethat R-LPN prevails as n increases to 10. From this trend,we speculate that the advantage of R-LPN over the classicalalgorithms in terms of the robustness to classical noise canbecome greater as the problem size increases.

Note that by increasing the number of epoch, the numberof required samples can be further reduced, at the cost ofincreasing the run time.

4.3.3 ε-Greedy algorithm

We also tested an ε-greedy algorithm as the policy formaking an action via simulation with n ≤ 12. Here, sM thatmaximizes the reward is used to guess the quantum oraclewith a probability of 1 − ε, and a randomly guessed n-bitstring is chosen as sM with a probability of ε. The simulation

Fig. 8 Simulation results for thenumber of samples required forsucceeding an LPN algorithm asa function of n for (a) η = 0, (b)η = 0.1, and (c) η = 0.2. Curveswithout symbols represent thesimulation results for the R-LPNalgorithm of this work.Simulation results of knownclassical methods listed inTable 1 are also plotted andindicated by squares for AL,triangles for BKW, and circlesfor L. The number of epoch is n2

in all simulations in this figure

2 4 6 8 10 120

50

100

150

200

# of

sam

ples

R-LPNALBKWL

2 4 6 8 10 120

50

100

150

200

250

300

350

# of

sam

ples

R-LPNALBKWL

2 4 6 8 10 120

100

200

300

400

500

# of

sam

ples

R-LPNALBKWL

Page 10: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

0 Page 10 of 11 QuantumMachine Intelligence (2020) 2:0

Fig. 9 Simulation results for thenumber of samples required forsucceeding an LPN algorithm asa function of η for (a) n = 6, (b)n = 8, and (c) n = 10. Curveswithout symbols represent thesimulation results for the R-LPNalgorithm. Simulation results ofknown classical methods listedin Table 1 are also plotted andindicated by squares for AL,triangles for BKW, and circlesfor L. The number of epoch is n2

in all simulations in this figure

0 0.1 0.2 0.30

100

200

300

400

500

# of

sam

ples

R-LPNALBKWL

0 0.1 0.2 0.30

100

200

300

400

500

600

# of

sam

ples

R-LPNALBKWL

0 0.1 0.2 0.30

200

400

600

800

# of

sam

ples

R-LPNALBKWL

shows that the ε-greedy algorithm does not provide anynoticeable improvement.

5 Robustness to Pauli errors

Without loss of generality, we assume that the eigenstatesof the σz operator constitute the computational basis. Thensince the R-LPN algorithm performs the measurement inthe σz basis, it is not affected by any error that effectivelyappears as unwanted phase rotations at the end of thequantum circuit.

According to Cross et al. (2015), when single-qubit bit-flip errors occur on the final state independently with aprobability ηx , the bit-wise majority vote on k post-selectedbit strings gives an estimate s such that the error can bebounded as .When the R-LPN algorithm is completed, it outputs thesame final state as in Cross et al. (2015) with highprobability. Hence, the above result can be directly appliedto our algorithm for bit-flip errors on the final state. In thiscase, the algorithm needs to perform the bit-wise majorityvote at each cycle of querying a sample, increasing the totalrun time. Therefore, quantum noise in the R-LPN algorithmthat effectively accumulates on the final state as bit-fliperrors with an error rate of ηx < 1/2 only increases the timecomplexity by a factor of O(log(n)poly(1/(1/2 − ηx))),while the sample complexity remains the same.

6 Conclusion

The previously known quantum speedup in the learningparity with noise problem diminishes in the absence of thequantum oracle that provides a quantum state containingall possible examples in superposition upon a query.We developed a quantum-classical hybrid algorithm for

solving the LPN problem with classical examples. TheLPN problem is particularly challenging as it requires theexact solution to be found. Our work demonstrates that theconcept of variational quantum algorithms can be extendedfor solving such problems. The reinforcement learningsignificantly reduces both the sample and the time cost ofthe quantum LPN algorithm in the absence of the quantumoracle. Simulations in the regime of small problem size,i.e., n ≤ 12, show that our algorithm performs comparablyor better than the classical algorithm that performs thebest in this regime in terms of the sample complexity.The sample cost can be further reduced by increasingthe number of epoch, at the cost of increasing the runtime. In terms of the vulnerability to noise, our algorithmperforms better than classical algorithms in this regime.Furthermore, the run time can be reduced substantially, ifan efficient procedure for updating the quantum state isavailable.

The ability to utilize quantum mechanical propertiesto enhance existing classical methods for learning fromclassical data is a significant milestone toward practicalquantum learning. In particular, whether the knownadvantage of oracle-based quantum algorithms can beretained in the absence of the quantum oracle is aninteresting open problem. We showed that for the LPNproblem, quantum advantage can be achieved with theintegration of classical reinforcement learning.

Our results motivate future works to employ simi-lar strategies to known oracle-based quantum algorithmsin order to extend their applicability to classical data.For example, extending the idea of the quantum-classicalreinforcement learning to the learning with errors prob-lem (Grilo et al. 2019) would be an interesting future work.This work only considered classical noise and a simplequantum noise model, and detailed studies on the effectsof actual experimental quantum errors remain as the futurework. Complexity analysis of the proposed reinforcement

Page 11: Quantum-classical reinforcement learning for decoding ...dkp.quantum@gmail.com June-Koo Kevin Rhee rhee.jk@kaist.ac.kr 1 School of Electrical Engineering and ITRC of Quantum Computing

QuantumMachine Intelligence (2020) 2:0 Page 11 of 11 0

learning algorithm in the asymptotic limit also remains asan interesting future work.

Acknowledgments We thank Suhwang Jeong and Jeongseok Ha forstimulating discussions.

Funding information This work was supported by the Ministry ofScience and ICT, Korea, under an ITRC Program, IITP-2019-2018-0-01402, and by National Research Foundation of Korea (NRF-2019R1I1A1A01050161).

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict ofinterest.

References

Valiant LG (1984) A Theory of the Learnable. Commun ACM27(11):1134. https://doi.org/10.1145/1968.1972

Cross AW, Smith G, Smolin J (2015) Quantum learn-ing robust against noise. Phys Rev A 92:012327.https://doi.org/10.1103/PhysRevA.92.012327.

Lyubashevsky V (2005) The parity problem in the presence of noise,decoding random linear codes, and the subset sum problem(Springer, Berlin, Heidelberg. Lect Notes Comput Sci 3624:378–389. https://doi.org/10.1007/11538462 32

Regev O (2005) On Lattices, Learning with Errors, Random LinearCodes, and Cryptography. In: Proceedings of the 37th annualACM symposium on theory of computing. ACM, New York,pp 84–93. https://doi.org/10.1145/1060590.1060603

Pietrzak K (2012) Cryptography from Learning Parity with Noise. In:Bielikova M., Friedrich G., Gottlob G., Katzenbeisser S., TuranG. (eds) SOFSEM 2012: Theory and practice of computer science.Springer, Berlin, pp 99–114

Peruzzo A, McClean J, Shadbolt P, Yung MH, ZhouXQ, Love PJ, Aspuru-Guzik A, O’Brien JL (2014) Avariational eigenvalue solver on a photonic quantum pro-cessor. Nature Communications 5:4213. https://doi.org/10.1038/ncomms5213

McClean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) Thetheory of variational hybrid quantum-classical algorithms. NewJ Phys 18(2):023023. https://doi.org/10.1088/1367-2630/18/2/023023

Romero J, Olson JP, Aspuru-Guzik A (2017) Quantum autoencodersfor efficient compression of quantum data. Quantum Sci Technol2(4):045001. https://doi.org/10.1088/2058-9565/aa8072

Kandala A, Mezzacapo A, Temme K, Takita M, Brink M, ChowJM, Gambetta JM (2017) Hardware-efficient variational quantumeigensolver for small molecules and quantum magnets. Nature549:242. https://doi.org/10.1038/nature23879

Moll N, Barkoutsos P, Bishop LS, Chow JM, Cross A, Egger DJ,Filipp S, Fuhrer A, Gambetta JM, Ganzhorn M, Kandala A,Mezzacapo A, Muller P, Riess W, Salis G, Smolin J, TavernelliI, Temme K (2018) Quantum optimization using variationalalgorithms on near-term quantum devices. Quantum Sci Technol3(3):030503. https://doi.org/10.1088/2058-9565/aab822

Otterbach JS, Manenti R, Alidoust N, Bestwick A, Block M, BloomB, Caldwell S, Didier N, Schuyler Fried E, Hong S, KaralekasP, Osborn CB, Papageorge A, Peterson EC, Prawiroatmodjo G,Rubin N, Ryan CA, Scarabelli D, Scheer M, Sete EA, Sivarajah P,Smith RS, Staley A, Tezak N, Zeng WJ, Hudson A, Johnson BR,

Reagor M, da Silva MP, Rigetti C (2017) Unsupervised MachineLearning on a Hybrid Quantum Computer. arXiv:1712.05771

Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantumcircuit learning. Phys Rev A 98:032309. https://doi.org/10.1103/PhysRevA.98.032309

Schuld M, Bocharov A, Svore K, Wiebe N (2018) Circuit-centricquantum classifiers. arXiv:1804.00633

Dallaire-Demers PL, Killoran N (2018) Quantum gener-ative adversarial networks. Phys Rev A 98:012324.https://doi.org/10.1103/PhysRevA.98.012324

Fosel T, Tighineanu P, Weiss T, Marquardt F (2018) ReinforcementLearning with Neural Networks for Quantum Feedback. Phys RevX 8:031084. https://doi.org/10.1103/PhysRevX.8.031084.

Zoufal C, Lucchi A, Woerner S (2019) Quantum Generative Adver-sarial Networks for Learning and Loading Random Distributions.arXiv:1904.00043

Romero J, Aspuru-Guzik A (2019) Variational quantum generators:Generative adversarial quantum machine learning for continuousdistributions. arXiv:190100848

Li J, Yang X, Peng X, Sun CP (2017) Hybrid Quantum-Classical Approach to Quantum Optimal Control. Phys Rev Lett118:150503. https://doi.org/10.1103/PhysRevLett.118.150503.

Angluin D, Laird P (1988) Learning From Noisy Examples. MachLearn 2(4):343. https://doi.org/10.1023/A:1022873112823

Blum A, Kalai A, Wasserman H (2003) Noise-tolerant Learning,the Parity Problem, and the Statistical Query Model. J ACM50(4):506. https://doi.org/10.1145/792538.792543

Bernstein E, Vazirani U (1997) Quantum Complexity Theory. J SIAMComput 26(5):1411. https://doi.org/10.1137/S0097539796300921

Levieil E, Fouque PA (2006) An improved LPN algorithm (Springer,Berlin, Heidelberg. Lect Notes Comput Sci 4116:348–359.https://doi.org/10.1007/11832072 24

Riste D, da Silva MP, Ryan CA, Cross AW, Corcoles AD, SmolinJA, Gambetta JM, Chow JM, Johnson BR (2017) Demonstrationof quantum advantage in machine learning. npj Quantum Info3(1):16. https://doi.org/10.1038/s41534-017-0017-3

Knill E, Laflamme R (1998) Power of One Bit of Quan-tum Information. Phys Rev Lett 81:5672. https://doi.org/10.1103/PhysRevLett.81.5672.

Park DK, Rhee JKK, Lee S (2018) Noise-tolerant parity learningwith one quantum bit. Phys Rev A 97:032327. https://doi.org/10.1103/PhysRevA.97.032327.

Giovannetti V, Lloyd S, Maccone L (2008) Quantum Random AccessMemory. Phys Rev Lett 100:160501. https://doi.org/10.1103/PhysRevLett.100.160501.

Giovannetti V, Lloyd S, Maccone L (2008) Architectures for aquantum random access memory. Phys Rev A 78:052310.https://doi.org/10.1103/PhysRevA.78.052310.

Hong FY, Xiang Y, Zhu ZY, Jiang L. z., Wu LN (2012) Robustquantum random access memory. Phys Rev A 86:010306.https://doi.org/10.1103/PhysRevA.86.010306.

Arunachalam S, Gheorghiu V, Jochym-O’Connor T, Mosca M,Srinivasan PV (2015) On the robustness of bucket brigadequantum RAM. New J Phys 17(12):123010. http://stacks.iop.org/1367-2630/17/i=12/a=123010

Park DK, Petruccione F, Rhee JKK (2019) Circuit-Based QuantumRandom Access Memory for Classical Data. Sci Rep 9(1):3949.https://doi.org/10.1038/s41598-019-40439-3

Grilo AB, Kerenidis I, Zijlstra T (2019) Learning-with-errorsproblem is easy with quantum samples. Phys Rev A 99:032314.https://doi.org/10.1103/PhysRevA.99.032314.

Publisher’s note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.