DEEPCODER: LEARNING TO WRITE PROGRAMS - · PDF fileindependently—solving many synthesis...

20
Published as a conference paper at ICLR 2017 D EEP C ODER :L EARNING TO WRITE P ROGRAMS Matej Balog * Department of Engineering University of Cambridge Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow Microsoft Research ABSTRACT We develop a first line of attack for solving programming competition-style prob- lems from input-output examples using deep learning. The approach is to train a neural network to predict properties of the program that generated the outputs from the inputs. We use the neural network’s predictions to augment search techniques from the programming languages community, including enumerative search and an SMT-based solver. Empirically,we show that our approach leads to an order of magnitude speedup over the strong non-augmented baselines and a Recurrent Neural Network approach, and that we are able to solve problems of difficulty comparable to the simplest problems on programming competition websites. 1 I NTRODUCTION A dream of artificial intelligence is to build systems that can write computer programs. Recently, there has been much interest in program-like neural network models (Graves et al., 2014; Weston et al., 2015; Kurach et al., 2015; Joulin & Mikolov, 2015; Grefenstette et al., 2015; Sukhbaatar et al., 2015; Neelakantan et al., 2016; Kaiser & Sutskever, 2016; Reed & de Freitas, 2016; Zaremba et al., 2016; Graves et al., 2016), but none of these can write programs; that is, they do not generate human-readable source code. Only very recently, Riedel et al. (2016); Bunel et al. (2016); Gaunt et al. (2016) explored the use of gradient descent to induce source code from input-output examples via differentiable interpreters, and Ling et al. (2016) explored the generation of source code from unstructured text descriptions. However, Gaunt et al. (2016) showed that differentiable interpreter- based program induction is inferior to discrete search-based techniques used by the programming languages community. We are then left with the question of how to make progress on program induction using machine learning techniques. In this work, we propose two main ideas: (1) learn to induce programs; that is, use a corpus of program induction problems to learn strategies that generalize across problems, and (2) integrate neural network architectures with search-based techniques rather than replace them. In more detail, we can contrast our approach to existing work on differentiable interpreters. In dif- ferentiable interpreters, the idea is to define a differentiable mapping from source code and inputs to outputs. After observing inputs and outputs, gradient descent can be used to search for a pro- gram that matches the input-output examples. This approach leverages gradient-based optimization, which has proven powerful for training neural networks, but each synthesis problem is still solved independently—solving many synthesis problems does not help to solve the next problem. We argue that machine learning can provide significant value towards solving Inductive Program Synthesis (IPS) by re-casting the problem as a big data problem. We show that training a neural network on a large number of generated IPS problems to predict cues from the problem description can help a search-based technique. In this work, we focus on predicting an order on the program space and show how to use it to guide search-based techniques that are common in the programming languages community. This approach has three desirable properties: first, we transform a difficult search problem into a supervised learning problem; second, we soften the effect of failures of the neural network by searching over program space rather than relying on a single prediction; and third, the neural network’s predictions are used to guide existing program synthesis systems, allowing us to use and improve on the best solvers from the programming languages community. Empirically, we * Also affiliated with Max-Planck Institute for Intelligent Systems, T ¨ ubingen, Germany. Work done while author was an intern at Microsoft Research. 1

Transcript of DEEPCODER: LEARNING TO WRITE PROGRAMS - · PDF fileindependently—solving many synthesis...

Published as a conference paper at ICLR 2017

DEEPCODER: LEARNING TO WRITE PROGRAMS

Matej Balog∗Department of EngineeringUniversity of Cambridge

Alexander L. Gaunt, Marc Brockschmidt,Sebastian Nowozin, Daniel TarlowMicrosoft Research

ABSTRACT

We develop a first line of attack for solving programming competition-style prob-lems from input-output examples using deep learning. The approach is to train aneural network to predict properties of the program that generated the outputs fromthe inputs. We use the neural network’s predictions to augment search techniquesfrom the programming languages community, including enumerative search andan SMT-based solver. Empirically, we show that our approach leads to an orderof magnitude speedup over the strong non-augmented baselines and a RecurrentNeural Network approach, and that we are able to solve problems of difficultycomparable to the simplest problems on programming competition websites.

1 INTRODUCTION

A dream of artificial intelligence is to build systems that can write computer programs. Recently,there has been much interest in program-like neural network models (Graves et al., 2014; Westonet al., 2015; Kurach et al., 2015; Joulin & Mikolov, 2015; Grefenstette et al., 2015; Sukhbaataret al., 2015; Neelakantan et al., 2016; Kaiser & Sutskever, 2016; Reed & de Freitas, 2016; Zarembaet al., 2016; Graves et al., 2016), but none of these can write programs; that is, they do not generatehuman-readable source code. Only very recently, Riedel et al. (2016); Bunel et al. (2016); Gauntet al. (2016) explored the use of gradient descent to induce source code from input-output examplesvia differentiable interpreters, and Ling et al. (2016) explored the generation of source code fromunstructured text descriptions. However, Gaunt et al. (2016) showed that differentiable interpreter-based program induction is inferior to discrete search-based techniques used by the programminglanguages community. We are then left with the question of how to make progress on programinduction using machine learning techniques.

In this work, we propose two main ideas: (1) learn to induce programs; that is, use a corpus ofprogram induction problems to learn strategies that generalize across problems, and (2) integrateneural network architectures with search-based techniques rather than replace them.

In more detail, we can contrast our approach to existing work on differentiable interpreters. In dif-ferentiable interpreters, the idea is to define a differentiable mapping from source code and inputsto outputs. After observing inputs and outputs, gradient descent can be used to search for a pro-gram that matches the input-output examples. This approach leverages gradient-based optimization,which has proven powerful for training neural networks, but each synthesis problem is still solvedindependently—solving many synthesis problems does not help to solve the next problem.

We argue that machine learning can provide significant value towards solving Inductive ProgramSynthesis (IPS) by re-casting the problem as a big data problem. We show that training a neuralnetwork on a large number of generated IPS problems to predict cues from the problem descriptioncan help a search-based technique. In this work, we focus on predicting an order on the programspace and show how to use it to guide search-based techniques that are common in the programminglanguages community. This approach has three desirable properties: first, we transform a difficultsearch problem into a supervised learning problem; second, we soften the effect of failures of theneural network by searching over program space rather than relying on a single prediction; and third,the neural network’s predictions are used to guide existing program synthesis systems, allowing us touse and improve on the best solvers from the programming languages community. Empirically, we

∗Also affiliated with Max-Planck Institute for Intelligent Systems, Tubingen, Germany. Work done whileauthor was an intern at Microsoft Research.

1

Published as a conference paper at ICLR 2017

show orders-of-magnitude improvements over optimized standard search techniques and a RecurrentNeural Network-based approach to the problem.

In summary, we define and instantiate a framework for using deep learning for program synthesisproblems like ones appearing on programming competition websites. Our concrete contributions are:

1. defining a programming language that is expressive enough to include real-world program-ming problems while being high-level enough to be predictable from input-output examples;

2. models for mapping sets of input-output examples to program properties; and3. experiments that show an order of magnitude speedup over standard program synthesis

techniques, which makes this approach feasible for solving problems of similar difficulty asthe simplest problems that appear on programming competition websites.

2 BACKGROUND ON INDUCTIVE PROGRAM SYNTHESIS

We begin by providing background on Inductive Program Synthesis, including a brief overview ofhow it is typically formulated and solved in the programming languages community.

The Inductive Program Synthesis (IPS) problem is the following: given input-output examples,produce a program that has behavior consistent with the examples.

Building an IPS system requires solving two problems. First, the search problem: to find consistentprograms we need to search over a suitable set of possible programs. We need to define the set(i.e., the program space) and search procedure. Second, the ranking problem: if there are multipleprograms consistent with the input-output examples, which one do we return? Both of these problemsare dependent on the specifics of the problem formulation. Thus, the first important decision informulating an approach to program synthesis is the choice of a Domain Specific Language.

Domain Specific Languages (DSLs). DSLs are programming languages that are suitable for aspecialized domain but are more restrictive than full-featured programming languages. For example,one might disallow loops or other control flow, and only allow string data types and a small number ofprimitive operations like concatenation. Most of program synthesis research focuses on synthesizingprograms in DSLs, because full-featured languages like C++ enlarge the search space and complicatesynthesis. Restricted DSLs can also enable more efficient special-purpose search algorithms. Forexample, if a DSL only allows concatenations of substrings of an input string, a dynamic program-ming algorithm can efficiently search over all possible programs (Polozov & Gulwani, 2015). Thechoice of DSL also affects the difficulty of the ranking problem. For example, in a DSL without ifstatements, the same algorithm is applied to all inputs, reducing the number of programs consistentwith any set of input-output examples, and thus the ranking problem becomes easier. Of course, therestrictiveness of the chosen DSL also determines which problems the system can solve at all.

Search Techniques. There are many techniques for searching for programs consistent with input-output examples. Perhaps the simplest approach is to define a grammar and then enumerate allderivations of the grammar, checking each one for consistency with the examples. This approachcan be combined with pruning based on types and other logical reasoning (Feser et al., 2015). Whilesimple, these approaches can be implemented efficiently, and they can be surprisingly effective.

In restricted domains such as the concatenation example discussed above, special-purpose algorithmscan be used. FlashMeta (Polozov & Gulwani, 2015) describes a framework for DSLs which allowdecomposition of the search problem, e.g., where the production of an output string from an inputstring can be reduced to finding a program for producing the first part of the output and concatenatingit with a program for producing the latter part of the output string.

Another class of systems is based on Satisfiability Modulo Theories (SMT) solving. SMT combinesSAT-style search with theories like arithmetic and inequalities, with the benefit that theory-dependentsubproblems can be handled by special-purpose solvers. For example, a special-purpose solver caneasily find integers x, y such that x < y and y < −100 hold, whereas an enumeration strategy mayneed to consider many values before satisfying the constraints. Many program synthesis enginesbased on SMT solvers exist, e.g., Sketch (Solar-Lezama, 2008) and Brahma (Gulwani et al., 2011).They convert the semantics of a DSL into a set of constraints between variables representing the

2

Published as a conference paper at ICLR 2017

program and the input-output values, and then call an SMT solver to find a satisfying setting ofthe program variables. This approach shines when special-purpose reasoning can be leveraged, butcomplex DSLs can lead to very large constraint problems where constructing and manipulating theconstraints can be a lot slower than an enumerative approach.

Finally, stochastic local search can be employed to search over program space, and there is a longhistory of applying genetic algorithms to this problem. One of the most successful recent examplesis the STOKE super-optimization system (Schkufza et al., 2016), which uses stochastic local searchto find assembly programs that have the same semantics as an input program but execute faster.

Ranking. While we focus on the search problem in this work, we briefly mention the rankingproblem here. A popular choice for ranking is to choose the shortest program consistent with input-output examples (Gulwani, 2016). A more sophisticated approach is employed by FlashFill (Singh& Gulwani, 2015). It works in a manner similar to max-margin structured prediction, where knownground truth programs are given, and the learning task is to assign scores to programs such that theground truth programs score higher than other programs that satisfy the input-output specification.

3 LEARNING INDUCTIVE PROGRAM SYNTHESIS (LIPS)

In this section we outline the general approach that we follow in this work, which we call LearningInductive Program Synthesis (LIPS). The details of our instantiation of LIPS appear in Sect. 4. Thecomponents of LIPS are (1) a DSL specification, (2) a data-generation procedure, (3) a machine learn-ing model that maps from input-output examples to program attributes, and (4) a search procedurethat searches program space in an order guided by the model from (3). The framework is related tothe formulation of Menon et al. (2013); the relationship and key differences are discussed in Sect. 6.

(1) DSL and Attributes. The choice of DSL is important in LIPS, just as it is in any programsynthesis system. It should be expressive enough to capture the problems that we wish to solve, butrestricted as much as possible to limit the difficulty of the search. In LIPS we additionally specifyan attribute function A that maps programs P of the DSL to finite attribute vectors a = A(P ).(Attribute vectors of different programs need not have equal length.) Attributes serve as the linkbetween the machine learning and the search component of LIPS: the machine learning modelpredicts a distribution q(a | E), where E is the set of input-output examples, and the search procedureaims to search over programs P as ordered by q(A(P ) | E). Thus an attribute is useful if it is bothpredictable from input-output examples, and if conditioning on its value significantly reduces theeffective size of the search space.

Possible attributes are the (perhaps position-dependent) presence or absence of high-level functions(e.g., does the program contain or end in a call to SORT). Other possible attributes include controlflow templates (e.g., the number of loops and conditionals). In the extreme case, one may set Ato the identity function, in which case the attribute is equivalent to the program; however, in ourexperiments we find that performance is improved by choosing a more abstract attribute function.

(2) Data Generation. Step 2 is to generate a dataset ((P (n),a(n), E(n)))Nn=1 of programs P (n) inthe chosen DSL, their attributes a(n), and accompanying input-output examples E(n). Different ap-proaches are possible, ranging from enumerating valid programs in the DSL and pruning, to traininga more sophisticated generative model of programs in the DSL. The key in the LIPS formulation isto ensure that it is feasible to generate a large dataset (ideally millions of programs).

(3) Machine Learning Model. The machine learning problem is to learn a distribution of at-tributes given input-output examples, q(a | E). There is freedom to explore a large space of models,so long as the input component can encode E , and the output is a proper distribution over attributes(e.g., if attributes are a fixed-size binary vector, then a neural network with independent sigmoidoutputs is appropriate; if attributes are variable size, then a recurrent neural network output could beused). Attributes are observed at training time, so training can use a maximum likelihood objective.

(4) Search. The aim of the search component is to interface with an existing solver, using thepredicted q(a | E) to guide the search. We describe specific approaches in the next section.

3

Published as a conference paper at ICLR 2017

4 DEEPCODER

Here we describe DeepCoder, our instantiation of LIPS including a choice of DSL, a data generationstrategy, models for encoding input-output sets, and algorithms for searching over program space.

4.1 DOMAIN SPECIFIC LANGUAGE AND ATTRIBUTES

We consider binary attributes indicating the presence or absence of high-level functions in the targetprogram. To make this effective, the chosen DSL needs to contain constructs that are not so low-levelthat they all appear in the vast majority of programs, but at the same time should be common enoughso that predicting their occurrence from input-output examples can be learned successfully.

Following this observation, our DSL is loosely inspired by query languages such as SQL or LINQ,where high-level functions are used in sequence to manipulate data. A program in our DSL is asequence of function calls, where the result of each call initializes a fresh variable that is either asingleton integer or an integer array. Functions can be applied to any of the inputs or previouslycomputed (intermediate) variables. The output of the program is the return value of the last functioncall, i.e., the last variable. See Fig. 1 for an example program of length T = 4 in our DSL.

a← [int]b← FILTER (<0) ac← MAP (*4) bd← SORT ce← REVERSE d

An input-output example:Input:[-17, -3, 4, 11, 0, -5, -9, 13, 6, 6, -8, 11]Output:[-12, -20, -32, -36, -68]

Figure 1: An example program in our DSL that takes a single integer array as its input.

Overall, our DSL contains the first-order functions HEAD, LAST, TAKE, DROP, ACCESS, MINIMUM,MAXIMUM, REVERSE, SORT, SUM, and the higher-order functions MAP, FILTER, COUNT, ZIP-WITH, SCANL1. Higher-order functions require suitable lambda functions for their behavior to befully specified: for MAP our DSL provides lambdas (+1), (-1), (*2), (/2), (*(-1)), (**2),(*3), (/3), (*4), (/4); for FILTER and COUNT there are predicates (>0), (<0), (%2==0),(%2==1) and for ZIPWITH and SCANL1 the DSL provides lambdas (+), (-), (*), MIN, MAX.A description of the semantics of all functions is provided in Appendix F.

Note that while the language only allows linear control flow, many of its functions do performbranching and looping internally (e.g., SORT, COUNT, ...). Examples of more sophisticated programsexpressible in our DSL, which were inspired by the simplest problems appearing on programmingcompetition websites, are shown in Appendix A.

4.2 DATA GENERATION

To generate a dataset, we enumerate programs in the DSL, heuristically pruning away those witheasily detectable issues such as a redundant variable whose value does not affect the program output,or, more generally, existence of a shorter equivalent program (equivalence can be overapproximatedby identical behavior on randomly or carefully chosen inputs). To generate valid inputs for a program,we enforce a constraint on the output value bounding integers to some predetermined range, and thenpropagate these constraints backward through the program to obtain a range of valid values for eachinput. If one of these ranges is empty, we discard the program. Otherwise, input-output pairs can begenerated by picking inputs from the pre-computed valid ranges and executing the program to obtainthe output values. The binary attribute vectors are easily computed from the program source codes.

4.3 MACHINE LEARNING MODEL

Observe how the input-output data in Fig. 1 is informative of the functions appearing in the program:the values in the output are all negative, divisible by 4, they are sorted in decreasing order, and theyhappen to be multiples of numbers appearing in the input. Our aim is to learn to recognize suchpatterns in the input-output examples, and to leverage them to predict the presence or absence of

4

Published as a conference paper at ICLR 2017

individual functions. We employ neural networks to model and learn the mapping from input-outputexamples to attributes. We can think of these networks as consisting of two parts:

1. an encoder: a differentiable mapping from a set of M input-output examples generated bya single program to a latent real-valued vector, and

2. a decoder: a differentiable mapping from the latent vector representing a set of M input-output examples to predictions of the ground truth program’s attributes.

For the encoder we use a simple feed-forward architecture. First, we represent the input and outputtypes (singleton or array) by a one-hot-encoding, and we pad the inputs and outputs to a maximumlength L with a special NULL value. Second, each integer in the inputs and in the output is mappedto a learned embedding vector of size E = 20. (The range of integers is restricted to a finite rangeand each embedding is parametrized individually.) Third, for each input-output example separately,we concatenate the embeddings of the input types, the inputs, the output type, and the output into asingle (fixed-length) vector, and pass this vector through H = 3 hidden layers containing K = 256sigmoid units each. The third hidden layer thus provides an encoding of each individual input-outputexample. Finally, for input-output examples in a set generated from the same program, we pool theserepresentations together by simple arithmetic averaging. See Appendix C for more details.

The advantage of this encoder lies in its simplicity, and we found it reasonably easy to train. Adisadvantage is that it requires an upper bound L on the length of arrays appearing in the input andoutput. We confirmed that the chosen encoder architecture is sensible in that it performs empiricallyat least as well as an RNN encoder, a natural baseline, which may however be more difficult to train.

DeepCoder learns to predict presence or absence of individual functions of the DSL. We shall seethis can already be exploited by various search techniques to large computational gains. We use adecoder that pre-multiplies the encoding of input-output examples by a learned C×K matrix, whereC = 34 is the number of functions in our DSL (higher-order functions and lambdas are predictedindependently), and treats the resulting C numbers as log-unnormalized probabilities (logits) of eachfunction appearing in the source code. Fig. 2 shows the predictions a trained neural network madefrom 5 input-output examples for the program shown in Fig. 1.

(+1

)

(-1

)

(*2

)

(/2

)

(*-1

)

(**2

)

(*3

)

(/3

)

(*4

)

(/4

)

(>0

)

(>0

)

(%2

==

1)

(%2

==

0)

HEA

D

LAST

MA

P

FILT

ER

SO

RT

REV

ER

SE

TA

KE

DR

OP

AC

CESS

ZIP

WIT

H

SC

AN

L1

+ - * MIN

MA

X

CO

UN

T

MIN

IMU

M

MA

XIM

UM

SU

M

.0 .0 .1 .0 .0 .0 .0 .0 1.0 .0 .0 1.0 .0 .2 .0 .0 1.0 1.0 1.0 .7 .0 .1 .0 .4 .0 .0 .1 .0 .2 .1 .0 .0 .0 .0

Figure 2: Neural network predicts the probability of each function appearing in the source code.

4.4 SEARCH

One of the central ideas of this work is to use a neural network to guide the search for a programconsistent with a set of input-output examples instead of directly predicting the entire source code.This section briefly describes the search techniques and how they integrate the predicted attributes.

Depth-first search (DFS). We use an optimized version of DFS to search over programs with agiven maximum length T (see Appendix D for details). When the search procedure extends a partialprogram by a new function, it has to try the functions in the DSL in some order. At this point DFScan opt to consider the functions as ordered by their predicted probabilities from the neural network.

“Sort and add” enumeration. A stronger way of utilizing the predicted probabilities of functionsin an enumerative search procedure is to use a Sort and add scheme, which maintains a set of activefunctions and performs DFS with the active function set only. Whenever the search fails, the nextmost probable function (or several) are added to the active set and the search restarts with this largeractive set. Note that this scheme has the deficiency of potentially re-exploring some parts of thesearch space several times, which could be avoided by a more sophisticated search procedure.

Sketch. Sketch (Solar-Lezama, 2008) is a successful SMT-based program synthesis tool from theprogramming languages research community. While its main use case is to synthesize programs

5

Published as a conference paper at ICLR 2017

by filling in “holes” in incomplete source code so as to match specified requirements, it is flexibleenough for our use case as well. The function in each step and its arguments can be treated asthe “holes”, and the requirement to be satisfied is consistency with the provided set of input-outputexamples. Sketch can utilize the neural network predictions in a Sort and add scheme as describedabove, as the possibilities for each function hole can be restricted to the current active set.

λ2. λ2 (Feser et al., 2015) is a program synthesis tool from the programming languages communitythat combines enumerative search with deduction to prune the search space. It is designed to infersmall functional programs for data structure manipulation from input-output examples, by combiningfunctions from a provided library. λ2 can be used in our framework using a Sort and add scheme asdescribed above by choosing the library of functions according to the neural network predictions.

4.5 TRAINING LOSS FUNCTION

We use the negative cross entropy loss to train the neural network described in Sect. 4.3, so that itspredictions about each function can be interpreted as marginal probabilities. The LIPS frameworkdictates learning q(a | E), the joint distribution of all attributes a given the input-output examples,and it is not clear a priori how much DeepCoder loses by ignoring correlations between functions.However, under the simplifying assumption that the runtime of searching for a program of length Twith C functions made available to a search routine is proportional to CT , the following result forSort and add procedures shows that their runtime can be optimized using marginal probabilities.Lemma 1. For any fixed program length T , the expected total runtime of a Sort and add searchscheme can be upper bounded by a quantity that is minimized by adding the functions in the order ofdecreasing true marginal probabilities.

Proof. Predicting source code functions from input-output examples can be seen as a multi-labelclassification problem, where each set of input-output examples is associated with a set of relevantlabels (functions appearing in the ground truth source code). Dembczynski et al. (2010) showedthat in multi-label classification under a so-called Rank loss, it is Bayes optimal to rank the labelsaccording to their marginal probabilities. If the runtime of search with C functions is proportionalto CT , the total runtime of a Sort and add procedure can be monotonically transformed so that it isupper bounded by this Rank loss. See Appendix E for more details.

5 EXPERIMENTS

In this section we report results from two categories of experiments. Our main experiments (Sect. 5.1)show that the LIPS framework can lead to significant performance gains in solving IPS by demon-strating such gains with DeepCoder. In Sect. 5.2 we illustrate the robustness of the method bydemonstrating a strong kind of generalization ability across programs of different lengths.

5.1 DEEPCODER COMPARED TO BASELINES

We trained a neural network as described in Sect. 4.3 to predict used functions from input-outputexamples and constructed a test set of P = 500 programs, guaranteed to be semantically disjoint fromall programs on which the neural network was trained (similarly to the equivalence check describedin Sect. 4.2, we have ensured that all test programs behave differently from all programs used duringtraining on at least one input). For each test program we generated M = 5 input-output examplesinvolving integers of magnitudes up to 256, passed the examples to the trained neural network, andfed the obtained predictions to the search procedures from Sect. 4.4. We also considered a RNN-baseddecoder generating programs using beam search (see Sect. 5.3 for details). To evaluate DeepCoder,we then recorded the time the search procedures needed to find a program consistent with the Minput-output examples. As a baseline, we also ran all search procedures using a simple prior asfunction probabilities, computed from their global incidence in the program corpus.

In the first, smaller-scale experiment (program search space size ∼ 2 × 106) we trained the neuralnetwork on programs of length T = 3, and the test programs were of the same length. Table 1 showsthe per-task timeout required such that a solution could be found for given proportions of the testtasks (in time less than or equal to the timeout). For example, in a hypothetical test set with 4 tasks

6

Published as a conference paper at ICLR 2017

Table 1: Search speedups on programs of length T = 3 due to using neural network predictions.

Timeout needed DFS Enumeration λ2 Sketch Beamto solve 20% 40% 60% 20% 40% 60% 20% 40% 60% 20% 40% 20%

Baseline 41ms 126ms 314ms 80ms 335ms 861ms 18.9s 49.6s 84.2s >103s >103s >103sDeepCoder 2.7ms 33ms 110ms 1.3ms 6.1ms 27ms 0.23s 0.52s 13.5s 2.13s 455s 292s

Speedup 15.2× 3.9× 2.9× 62.2× 54.6× 31.5× 80.4× 94.6× 6.2× >467× >2.2× >3.4×

and runtimes of 3s, 2s, 1s, 4s, the timeout required to solve 50% of tasks would be 2s. More detailedexperimental results are discussed in Appendix B.

In the main experiment, we tackled a large-scale problem of searching for programs consistent withinput-output examples generated from programs of length T = 5 (search space size on the order of1010), supported by a neural network trained with programs of shorter length T = 4. Here, we onlyconsider P = 100 programs for reasons of computational efficiency, after having verified that thisdoes not significantly affect the results in Table 1. The table in Fig. 3a shows significant speedupsfor DFS, Sort and add enumeration, and λ2 with Sort and add enumeration, the search techniquescapable of solving the search problem in reasonable time frames. Note that Sort and add enumerationwithout the neural network (using prior probabilities of functions) exceeded the 104 second timeoutin two cases, so the relative speedups shown are crude lower bounds.

Timeout needed DFS Enumeration λ2

to solve 20% 40% 60% 20% 40% 60% 20%

Baseline 163s 2887s 6832s 8181s >104s >104s 463sDeepCoder 24s 514s 2654s 9s 264s 4640s 48s

Speedup 6.8× 5.6× 2.6× 907× >37× >2× 9.6×

(a)

1 2 3 4 5

Length of test programs Ttest

100

101

102

103

Speedup

1

2

3

4

Ttrain :

none

(b)

Figure 3: Search speedups on programs of length T = 5 and influence of length of training programs.

We hypothesize that the substantially larger performance gains on Sort and add schemes as comparedto gains on DFS can be explained by the fact that the choice of attribute function (predicting presenceof functions anywhere in the program) and learning objective of the neural network are better matchedto the Sort and add schemes. Indeed, a more appropriate attribute function for DFS would be onethat is more informative of the functions appearing early in the program, since exploring an incorrectfirst function is costly with DFS. On the other hand, the discussion in Sect. 4.5 provides theoreticalindication that ignoring the correlations between functions is not cataclysmic for Sort and addenumeration, since a Rank loss that upper bounds the Sort and add runtime can still be minimized.

In Appendix G we analyse the performance of the neural networks used in these experiments, byinvestigating which attributes (program instructions) tend to be difficult to distinguish from eachother.

5.2 GENERALIZATION ACROSS PROGRAM LENGTHS

To investigate the encoder’s generalization ability across programs of different lengths, we traineda network to predict used functions from input-output examples that were generated from programsof length Ttrain ∈ {1, . . . , 4}. We then used each of these networks to predict functions on 5 test setscontaining input-output examples generated from programs of lengths Ttest ∈ {1, . . . , 5}, respectively.The test programs of a given length T were semantically disjoint from all training programs of thesame length T and also from all training and test programs of shorter lengths T ′ < T .

For each of the combinations of Ttrain and Ttest, Sort and add enumerative search was run both withand without using the neural network’s predictions (in the latter case using prior probabilities) untilit solved 20% of the test set tasks. Fig. 3b shows the relative speedup of the solver having access topredictions from the trained neural networks. These results indicate that the neural networks are ableto generalize beyond programs of the same length that they were trained on. This is partly due to the

7

Published as a conference paper at ICLR 2017

search procedure on top of their predictions, which has the opportunity to correct for the presence offunctions that the neural network failed to predict. Note that a sequence-to-sequence model trainedon programs of a fixed length could not be expected to exhibit this kind of generalization ability.

5.3 ALTERNATIVE MODELS

Encoder We evaluated replacing the feed-forward architecture encoder (Sect. 4.3) with an RNN, anatural baseline. Using a GRU-based RNN we were able to achieve results almost as good as usingthe feed-forward architecture, but found the RNN encoder more difficult to train.

Decoder We also considered a purely neural network-based approach, where an RNN decoderis trained to predict the entire program token-by-token. We combined this with our feed-forwardencoder by initializing the RNN using the pooled final layer of the encoder. We found it substantiallymore difficult to train an RNN decoder as compared to the independent binary classifiers employedabove. Beam search was used to explore likely programs predicted by the RNN, but it only lead to asolution comparable with the other techniques when searching for programs of lengths T ≤ 2, wherethe search space size is very small (on the order of 103). Note that using an RNN for both the encoderand decoder corresponds to a standard sequence-to-sequence model. However, we do do not rule outthat a more sophisticated RNN decoder or training procedure could be possibly more successful.

6 RELATED WORK

Machine Learning for Inductive Program Synthesis. There is relatively little work on usingmachine learning for programming by example. The most closely related work is that of Menonet al. (2013), in which a hand-coded set of features of input-output examples are used as “clues.”When a clue appears in the input-output examples (e.g., the output is a permutation of the input),it reweights the probabilities of productions in a probabilistic context free grammar by a learnedamount. This work shares the idea of learning to guide the search over program space conditional oninput-output examples. One difference is in the domains. Menon et al. (2013) operate on short stringmanipulation programs, where it is arguably easier to hand-code features to recognize patterns in theinput-output examples (e.g., if the outputs are always permutations or substrings of the input). Ourwork shows that there are strong cues in patterns in input-output examples in the domain of numbersand lists. However, the main difference is the scale. Menon et al. (2013) learns from a small (280examples), manually-constructed dataset, which limits the capacity of the machine learning modelthat can be trained. Thus, it forces the machine learning component to be relatively simple. Indeed,Menon et al. (2013) use a log-linear model and rely on hand-constructed features. LIPS automaticallygenerates training data, which yields datasets with millions of programs and enables high-capacitydeep learning models to be brought to bear on the problem.

Learning Representations of Program State. Piech et al. (2015) propose to learn joint embed-dings of program states and programs to automatically extend teacher feedback to many similarprograms in the MOOC setting. This work is similar in that it considers embedding program states,but the domain is different, and it otherwise specifically focuses on syntactic differences betweensemantically equivalent programs to provide stylistic feedback. Li et al. (2016) use graph neuralnetworks (GNNs) to predict logical descriptions from program states, focusing on data structureshapes instead of numerical and list data. Such GNNs may be a suitable architecture to encode statesappearing when extending our DSL to handle more complex data structures.

Learning to Infer. Very recently, Alemi et al. (2016) used neural sequence models in tandem withan automated theorem prover. Similar to our Sort and Add strategy, a neural network componentis trained to select premises that the theorem prover can use to prove a theorem. A recent exten-sion (Loos et al., 2017) is similar to our DFS enumeration strategy and uses a neural network to guidethe proof search at intermediate steps. The main differences are in the domains, and that they trainon an existing corpus of theorems. More broadly, if we view a DSL as defining a model and searchas a form of inference algorithm, then there is a large body of work on using discriminatively-trainedmodels to aid inference in generative models. Examples include Dayan et al. (1995); Kingma &Welling (2014); Shotton et al. (2013); Stuhlmuller et al. (2013); Heess et al. (2013); Jampani et al.(2015).

8

Published as a conference paper at ICLR 2017

7 DISCUSSION AND FUTURE WORK

We have presented a framework for improving IPS systems by using neural networks to translate cuesin input-output examples to guidance over where to search in program space. Our empirical resultsshow that for many programs, this technique improves the runtime of a wide range of IPS baselinesby 1-3 orders. We have found several problems in real online programming challenges that can besolved with a program in our language, which validates the relevance of the class of problems thatwe have studied in this work. In sum, this suggests that we have made significant progress towardsbeing able to solve programming competition problems, and the machine learning component playsan important role in making it tractable.

There remain some limitations, however. First, the programs we can synthesize are only the simplestproblems on programming competition websites and are simpler than most competition problems.Many problems require more complex algorithmic solutions like dynamic programming and search,which are currently beyond our reach. Our chosen DSL currently cannot express solutions to manyproblems. To do so, it would need to be extended by adding more primitives and allow for moreflexibility in program constructs (such as allowing loops). Second, we currently use five input-outputexamples with relatively large integer values (up to 256 in magnitude), which are probably moreinformative than typical (smaller) examples. While we remain optimistic about LIPS’s applicability asthe DSL becomes more complex and the input-output examples become less informative, it remainsto be seen what the magnitude of these effects are as we move towards solving large subsets ofprogramming competition problems.

We foresee many extensions of DeepCoder. We are most interested in better data generation pro-cedures by using generative models of source code, and to incorporate natural language problemdescriptions to lessen the information burden required from input-output examples. In sum, Deep-Coder represents a promising direction forward, and we are optimistic about the future prospects ofusing machine learning to synthesize programs.

ACKNOWLEDGMENTS

The authors would like to express their gratitude to Rishabh Singh and Jack Feser for their valuableguidance and help on using the Sketch and λ2 program synthesis systems.

REFERENCES

Alex A. Alemi, Francois Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. DeepMath -deep sequence models for premise selection. In Proocedings of the 29th Conference on Advancesin Neural Information Processing Systems (NIPS), 2016.

Rudy R Bunel, Alban Desmaison, Pawan K Mudigonda, Pushmeet Kohli, and Philip Torr. Adaptiveneural compilation. In Proceedings of the 29th Conference on Advances in Neural InformationProcessing Systems (NIPS), 2016.

Peter Dayan, Geoffrey E Hinton, Radford M Neal, and Richard S Zemel. The Helmholtz machine.Neural computation, 7(5):889–904, 1995.

Krzysztof Dembczynski, Willem Waegeman, Weiwei Cheng, and Eyke Hullermeier. On label de-pendence and loss minimization in multi-label classification. Machine Learning, 88(1):5–45,2012.

Krzysztof J. Dembczynski, Weiwei Cheng, and Eyke Hllermeier. Bayes optimal multilabel classifi-cation via probabilistic classifier chains. In Proceedings of the 27th International Conference onMachine Learning (ICML), 2010.

John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations frominput-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI), 2015.

Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, JonathanTaylor, and Daniel Tarlow. Terpret: A probabilistic programming language for program induction.CoRR, abs/1608.04428, 2016. URL http://arxiv.org/abs/1608.04428.

9

Published as a conference paper at ICLR 2017

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural Turing machines. CoRR, abs/1410.5401, 2014.URL http://arxiv.org/abs/1410.5401.

Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwinska, Sergio Gomez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou,et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.

Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, and Phil Blunsom. Learning totransduce with unbounded memory. In Proceedings of the 28th Conference on Advances in NeuralInformation Processing Systems (NIPS), 2015.

Sumit Gulwani. Programming by examples: Applications, algorithms, and ambiguity resolution. InProceedings of the 8th International Joint Conference on Automated Reasoning (IJCAR), 2016.

Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. Synthesis of loop-freeprograms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI), 2011.

Nicolas Heess, Daniel Tarlow, and John Winn. Learning to pass expectation propagation messages.In Proceedings of the 26th Conference on Advances in Neural Information Processing Systems(NIPS), 2013.

Varun Jampani, Sebastian Nowozin, Matthew Loper, and Peter V Gehler. The informed sampler: Adiscriminative approach to Bayesian inference in generative computer vision models. ComputerVision and Image Understanding, 136:32–44, 2015.

Armand Joulin and Tomas Mikolov. Inferring algorithmic patterns with stack-augmented recurrentnets. In Proceedings of the 28th Conference on Advances in Neural Information ProcessingSystems (NIPS), 2015.

Łukasz Kaiser and Ilya Sutskever. Neural GPUs learn algorithms. In Proceedings of the 4th Interna-tional Conference on Learning Representations, 2016.

Diederik P Kingma and Max Welling. Stochastic gradient VB and the variational auto-encoder. InProceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.

Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever. Neural random-access machines. InProceedings of the 4th International Conference on Learning Representations 2016, 2015.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. Gated graph sequence neuralnetworks. In Proceedings of the 4th International Conference on Learning Representations (ICLR),2016.

Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, Andrew Senior, FuminWang, and Phil Blunsom. Latent predictor networks for code generation. In Proceedings of the54th Annual Meeting of the Association for Computational Linguistics, 2016.

Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. Deep network guidedproof search. CoRR, abs/1701.06972, 2017. URL http://arxiv.org/abs/1701.06972.

Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler W Lampson, and Adam Kalai. Amachine learning framework for programming by example. In Proceedings of the InternationalConference on Machine Learning (ICML), 2013.

Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. Neural programmer: Inducing latent pro-grams with gradient descent. In Proceedings of the 4th International Conference on LearningRepresentations (ICLR), 2016.

Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J.Guibas. Learning program embeddings to propagate feedback on student code. In Proceedings ofthe 32nd International Conference on Machine Learning (ICML), 2015.

Oleksandr Polozov and Sumit Gulwani. FlashMeta: a framework for inductive program synthe-sis. In Proceedings of the International Conference on Object-Oriented Programming, Systems,Languages, and Applications (OOPSLA), 2015.

10

Published as a conference paper at ICLR 2017

Scott E. Reed and Nando de Freitas. Neural programmer-interpreters. In Proceedings of the 4thInternational Conference on Learning Representations (ICLR), 2016.

Sebastian Riedel, Matko Bosnjak, and Tim Rocktaschel. Programming with a differentiable forthinterpreter. CoRR, abs/1605.06640, 2016. URL http://arxiv.org/abs/1605.06640.

Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic program optimization. Commununicationsof the ACM, 59(2):114–122, 2016.

Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, MatCook, and Richard Moore. Real-time human pose recognition in parts from single depth images.Communications of the ACM, 56(1):116–124, 2013.

Rishabh Singh and Sumit Gulwani. Predicting a correct program in programming by example. InProceedings of the 27th Conference on Computer Aided Verification (CAV), 2015.

Armando Solar-Lezama. Program Synthesis By Sketching. PhD thesis, EECS Dept., UC Berkeley,2008.

Andreas Stuhlmuller, Jessica Taylor, and Noah D. Goodman. Learning stochastic inverses. In Pro-ceedings of the 26th Conference on Advances in Neural Information Processing Systems (NIPS),2013.

Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. End-to-end memory networks.In Proceedings of the 28th Conference on Advances in Neural Information Processing Systems(NIPS), 2015.

Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. In Proceedings of the 3rdInternational Conference on Learning Representations (ICLR), 2015.

Wojciech Zaremba, Tomas Mikolov, Armand Joulin, and Rob Fergus. Learning simple algorithmsfrom examples. In Proceedings of the 33nd International Conference on Machine Learning(ICML), 2016.

11

Published as a conference paper at ICLR 2017

A EXAMPLE PROGRAMS

This section shows example programs in our Domain Specific Language (DSL), together with input-output examples and short descriptions. These programs have been inspired by simple tasks appearingon real programming competition websites, and are meant to illustrate the expressive power of ourDSL.

Program 0:k← intb← [int]c← SORT bd← TAKE k ce← SUM d

Input-output example:Input:2, [3 5 4 7 5]Output:[7]

Description:A new shop near you is selling n paintings.You have k < n friends and you wouldlike to buy each of your friends a paintingfrom the shop. Return the minimal amountof money you will need to spend.

Program 1:w← [int]t← [int]c← MAP (*3) wd← ZIPWITH (+) c te← MAXIMUM d

Input-output example:Input:[6 2 4 7 9],[5 3 6 1 0]Output:27

Description:In soccer leagues, match winners areawarded 3 points, losers 0 points, and bothteams get 1 point in the case of a tie. Com-pute the number of points awarded to thewinner of a league given two arrays w, t ofthe same length, where w[i] (resp. t[i]) is thenumber of times team i won (resp. tied).

Program 2:a← [int]b← [int]c← ZIPWITH (-) b ad← COUNT (>0) c

Input-output example:Input:[6 2 4 7 9],[5 3 2 1 0]Output:4

Description:Alice and Bob are comparing their results ina recent exam. Given their marks per ques-tion as two arrays a and b, count on howmany questions Alice got more points thanBob.

Program 3:h← [int]b← SCANL1 MIN hc← ZIPWITH (-) h bd← FILTER (>0) ce← SUM d

Input-output example:Input:[8 5 7 2 5]Output:5

Description:Perditia is very peculiar about her gardenand wants that the trees standing in a row areall of non-increasing heights. Given the treeheights in centimeters in order of the row asan array h, compute how many centimetersshe needs to trim the trees in total.

Program 4:x← [int]y← [int]c← SORT xd← SORT ye← REVERSE df← ZIPWITH (*) d eg← SUM f

Input-output example:Input:[7 3 8 2 5],[2 8 9 1 3]Output:79

Description:Xavier and Yasmine are laying sticks to formnon-overlapping rectangles on the ground.They both have fixed sets of pairs of sticksof certain lengths (represented as arrays xand y of numbers). Xavier only lays sticksparallel to the x axis, and Yasmine lays sticksonly parallel to y axis. Compute the areatheir rectangles will cover at least.

Program 5:a← [int]b← REVERSE ac← ZIPWITH MIN a b

Input-output example:Input:[3 7 5 2 8]Output:[3 2 5 2 3]

Description:A sequence called Billy is looking into themirror, wondering how much weight it couldlose by replacing any of its elements by theirmirror images. Given a description of Billyas an array b of length n, return an array cof minimal sum where each element c[i] iseither b[i] or its mirror image b[n− i− 1].

12

Published as a conference paper at ICLR 2017

Program 6:t← [int]p← [int]c← MAP (-1) td← MAP (-1) pe← ZIPWITH (+) c df← MINIMUM e

IO example:Input:[4 8 11 2],[2 3 4 1]Output:1

Description:Umberto has a large collection of ties and match-ing pocket squares—too large, his wife says—and heneeds to sell one pair. Given their values as arrays tand p, assuming that he sells the cheapest pair, andselling costs 2, how much will he lose from the sale?

Program 7:s← [int]p← [int]c← SCANL1 (+) pd← ZIPWITH (*) s ce← SUM d

IO example:Input:[4 7 2 3],[2 1 3 1]Output:62

Description:Zack always promised his n friends to buy themcandy, but never did. Now he won the lotteryand counts how often and how much candy hepromised to his friends, obtaining arrays p (num-ber of promises) and s (number of promised sweets).He announces that to repay them, he will buys[1]+s[2]+...+s[n] pieces of candy for thefirst p[1] days, then s[2]+s[3]+...+s[n] forp[2] days, and so on, until he has fulfilled allpromises. How much candy will he buy in total?

Program 8:s← [int]b← REVERSE sc← ZIPWITH (-) b sd← FILTER (>0) ce← SUM d

IO example:Input:[1 2 4 5 7]Output:9

Description:Vivian loves rearranging things. Most of all, whenshe sees a row of heaps, she wants to make sure thateach heap has more items than the one to its left. Sheis also obsessed with efficiency, so always moves theleast possible number of items. Her dad really dislikesif she changes the size of heaps, so she only movessingle items between them, making sure that the set ofsizes of the heaps is the same as at the start; they areonly in a different order. When you come in, you seeheaps of sizes (of course, sizes strictly monotonicallyincreasing) s[0], s[1], ... s[n]. What isthe maximal number of items that Vivian could havemoved?

Fig. 4 shows the predictions made by a neural network trained on programs of length T = 4 thatwere ensured to be semantically disjoint from all 9 example programs shown in this section. For eachtask, the neural network was provided with 5 input-output examples.

(+1)

(-1)

(*2)

(/2)

(*-1

)

(**2

)

(*3)

(/3)

(*4)

(/4)

(>0)

(<0)

(%2=

=1)

(%2=

=0)

HEA

D

LAST

MA

P

FILT

ER

SO

RT

REV

ER

SE

TA

KE

DR

OP

AC

CESS

ZIP

WIT

H

SC

AN

L1

+ - * MIN

MA

X

CO

UN

T

MIN

IMU

M

MA

XIM

UM

SU

M

0: SORT b | TAKE a c | SUM d

1: MAP (*3) a | ZIPWITH + b c | MAXIMUM d

2: ZIPWITH - b a | COUNT (>0) c

3: SCANL1 MIN a | ZIPWITH - a b | FILTER (>0) c | SUM d

4: SORT a | SORT b | REVERSE d | ZIPWITH * d e | SUM f

5: REVERSE a | ZIPWITH MIN a b

6: MAP (-1) a | MAP (-1) b | ZIPWITH + c d | MINIMUM e

7: SCANL1 + b | ZIPWITH * a c | SUM d

8: REVERSE a | ZIPWITH - b a | FILTER (>0) c | SUM d

.0 .2 .0 .1 .4 .0 .0 .2 .0 .1 .0 .2 .1 .0 .1 .0 .3 .4 .2 .1 .5 .2 .2 .6 .5 .2 .4 .0 .9 .1 .0 .1 .0 1.0

.1 .1 .1 .1 .0 .0 1.0 .0 .1 .0 .2 .1 .1 .1 .0 .3 1.0 .2 .1 .1 .0 .0 .1 1.0 .0 .6 .6 .0 .1 .1 .2 .0 .9 .0

.1 .2 .0 .1 .0 .0 .0 .1 .0 .1 .2 .2 .3 .3 .0 .0 .6 .0 .1 .1 .0 .0 .0 1.0 .3 .4 .5 .0 .5 .5 1.0 .0 .0 .0

.3 .1 .1 .1 .1 .0 .0 .0 .0 .0 .1 .0 .0 .0 .0 .0 .6 .2 .1 .1 .0 .0 .0 1.0 .3 .3 .3 .1 .2 .7 .0 .0 .1 1.0

.0 .0 .1 .4 .1 .4 .0 .0 .2 .0 .0 .2 .0 .2 .1 .2 .9 .2 .1 .0 .0 .0 .4 .6 .2 .2 .3 .3 .4 .1 .2 .4 .0 .4

.2 .2 .0 .2 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .9 .0 .0 1.0 .0 .0 .0 1.0 .0 .2 .0 .0 1.0 .1 .0 .0 .0 .0

.1 .1 .0 .0 .0 .0 .0 .0 .0 .0 .0 .2 .2 .2 .7 .0 .3 .3 .1 .0 .0 .0 .0 1.0 .1 .9 .1 .0 .7 .2 .1 .8 .0 .0

.0 .0 .0 .0 .0 .1 .0 .0 .1 .0 .1 .1 .1 .1 .0 .1 .4 .1 .0 .0 .0 .0 .0 1.0 .8 .5 .4 1.0 .1 .0 .2 .0 .1 .7

.2 .1 .0 .1 .1 .0 .0 .1 .0 .1 .1 .1 .1 .0 .0 .0 .5 .5 .1 .0 .0 .0 .0 1.0 .4 .4 .5 .0 .3 .6 .0 .0 .1 1.0

Figure 4: Predictions of a neural network on the 9 example programs described in this section.Numbers in squares would ideally be close to 1 (function is present in the ground truth source code),whereas all other numbers should ideally be close to 0 (function is not needed).

B EXPERIMENTAL RESULTS

Results presented in Sect. 5.1 showcased the computational speedups obtained from the LIPS frame-work (using DeepCoder), as opposed to solving each program synthesis problem with only the

13

Published as a conference paper at ICLR 2017

information about global incidence of functions in source code available. For completeness, here weshow plots of raw computation times of each search procedure to solve a given number of problems.

Fig. 5 shows the computation times of DFS, of Enumerative search with a Sort and add scheme, ofthe λ2 and Sketch solvers with a Sort and add scheme, and of Beam search, when searching for aprogram consistent with input-output examples generated from P = 500 different test programs oflength T = 3. As discussed in Sect. 5.1, these test programs were ensured to be semantically disjointfrom all programs used to train the neural networks, as well as from all programs of shorter length(as discussed in Sect. 4.2).

10-4 10-3 10-2 10-1 100 101 102 103

Solver computation time [s]

0

100

200

300

400

500

Pro

gra

ms

solv

ed DFS: using neural network

DFS: using prior order

L2: Sort and add using neural network

L2: Sort and add in prior order

Enumeration: Sort and add using neural network

Enumeration: Sort and add in prior order

Beam search

Sketch: Sort and add using neural network

Sketch: Sort and add in prior order

Figure 5: Number of test problems solved versus computation time.

The “steps” in the results for Beam search are due to our search strategy, which doubles the size ofthe considered beam until reaching the timeout (of 1000 seconds) and thus steps occur wheneverthe search for a beam of size 2k is finished. For λ2, we observed that no solution for a given set ofallowed functions was ever found after about 5 seconds (on the benchmark machines), but that λ2continued to search. Hence, we introduced a hard timeout after 6 seconds for all but the last iterationsof our Sort and add scheme.

Fig. 6 shows the computation times of DFS, Enumerative search with a Sort and add scheme, andλ2 with a Sort and add scheme when searching for programs consistent with input-output examplesgenerated from P = 100 different test programs of length T = 5. The neural network was trainedon programs of length T = 4.

10-4 10-3 10-2 10-1 100 101 102 103 104

Solver computation time [s]

0

20

40

60

80

100

Pro

gra

ms

solv

ed DFS: using neural network

DFS: using prior order

L2: Sort and add using neural network

L2: Sort and add in prior order

Enumeration: Sort and add using neural network

Enumeration: Sort and add in prior order

Figure 6: Number of test problems solved versus computation time.

C THE NEURAL NETWORK

As briefly described in Sect. 4.3, we used the following simple feed-forward architecture encoder:

• For each input-output example in the set generated from a single ground truth program:

– Pad arrays appearing in the inputs and in the output to a maximum length L = 20 witha special NULL value.

– Represent the type (singleton integer or integer array) of each input and of the outputusing a one-hot-encoding vector. Embed each integer in the valid integer range (−256to 255) using a learned embedding into E = 20 dimensional space. Also learn anembedding for the padding NULL value.

14

Published as a conference paper at ICLR 2017

– Concatenate the representations of the input types, the embeddings of integers in theinputs, the representation of the output type, and the embeddings of integers in theoutput into a single (fixed-length) vector.

– Pass this vector through H = 3 hidden layers containing K = 256 sigmoid units each.

• Pool the last hidden layer encodings of each input-output example together by simple arith-metic averaging.

Fig. 7 shows a schematic drawing of this encoder architecture, together with the decoder that performsindependent binary classification for each function in the DSL, indicating whether or not it appearsin the ground truth source code.

Inputs 1 Outputs 1

…Program State

State Embeddings

Hiddens 1

Hiddens 2

Hiddens 3

Pooled

Inputs 5 Outputs 5

Final ActivationsSigmoids

Attribute Predictions

Figure 7: Schematic representation of our feed-forward encoder, and the decoder.

While DeepCoder learns to embed integers into a E = 20 dimensional space, we built the system upgradually, starting with a E = 2 dimensional space and only training on programs of length T = 1.Such a small scale setting allowed easier investigation of the workings of the neural network, andindeed Fig. 8 below shows a learned embedding of integers in R2. The figure demonstrates thatthe network has learnt the concepts of number magnitude, sign (positive or negative) and evenness,presumably due to FILTER (>0), FILTER (<0), FILTER (%2==0) and FILTER (%2==1) all beingamong the programs on which the network was trained.

D DEPTH-FIRST SEARCH

We use an optimized C++ implementation of depth-first search (DFS) to search over programs witha given maximum length T . In depth-first search, we start by choosing the first function (and itsarguments) of a potential solution program, and then recursively consider all ways of filling in therest of the program (up to length T ), before moving on to a next choice of first instruction (if asolution has not yet been found).

A program is considered a solution if it is consistent with all M = 5 provided input-output examples.Note that this requires evaluating all candidate programs on theM inputs and checking the results forequality with the provided M respective outputs. Our implementation of DFS exploits the sequentialstructure of programs in our DSL by caching the results of evaluating all prefixes of the currentlyconsidered program on the example inputs, thus allowing efficient reuse of computation betweencandidate programs with common prefixes.

This allows us to explore the search space at roughly the speed of ∼ 3× 106 programs per second.

15

Published as a conference paper at ICLR 2017

First embedding dimension φ1(n)

Seco

nd e

mbeddin

g d

imensi

on φ

2(n

)

-256-255

-7

-6-5

-4-3

-2 -1

012

34

56

7

254255

Null

even positive numberseven negative numberodd positive numbersodd negative numberszeroNull (padding value)

Figure 8: A learned embedding of integers {−256,−255, . . . ,−1, 0, 1, . . . , 255} in R2. The colorintensity corresponds to the magnitude of the embedded integer.

When the search procedure extends a partial program by a new function, it has to try the functionsin the DSL in some order. At this point DFS can opt to consider the functions as ordered by theirpredicted probabilities from the neural network. The probability of a function consisting of a higher-order function and a lambda is taken to be the minimum of the probabilities of the two constituentfunctions.

E TRAINING LOSS FUNCTION

In Sect. 4.5 we outlined a justification for using marginal probabilities of individual functions asa sensible intermediate representation to provide a solver employing a Sort and add scheme (weconsidered Enumerative search and the Sketch solver with this scheme). Here we provide a moredetailed discussion.

Predicting program components from input-output examples can be cast as a multilabel classificationproblem, where each instance (set of input-output examples) is associated with a set of relevantlabels (functions appearing in the code that generated the examples). We denote the number of labels(functions) by C, and note that throughout this work C = 34.

When the task is to predict a subset of labels y ∈ {0, 1}C , different loss functions can be employedto measure the prediction error of a classifier h(x) or ranking function f(x). Dembczynski et al.(2010) discuss the following three loss functions:

• Hamming loss counts the number of labels that are predicted incorrectly by a classifier h:

LH(y,h(x)) =

C∑c=1

1{yc 6=hc(x)}

• Rank loss counts the number of label pairs violating the condition that relevant labels areranked higher than irrelevant ones by a scoring function f :

Lr(y, f(x)) =

C∑(i,j):yi=1,yj=0

1{fi<fj}

• Subset Zero-One loss indicates whether all labels have been correctly predicted by h:

Ls(y,h(x)) = 1{y 6=h(x)}

16

Published as a conference paper at ICLR 2017

Dembczynski et al. (2010) proved that Bayes optimal decisions under the Hamming and Rank lossfunctions, i.e., decisions minimizing the expected loss under these loss functions, can be computedfrom marginal probabilities pc(yc|x). This suggests that:

• Multilabel classification under these two loss functions may not benefit from consideringdependencies between the labels.• ”Instead of minimizing the Rank loss directly, one can simply use any approach for single

label prediction that properly estimates the marginal probabilities.” (Dembczynski et al.,2012)

Training the neural network with the negative cross entropy loss function as the training objective isprecisely a method for properly estimating the marginal probabilities of labels (functions appearingin source code). It is thus a sensible step in preparation for making predictions under a Rank loss.

It remains to discuss the relationship between the Rank loss and the actual quantity we care about,which is the total runtime of a Sort and add search procedure. Recall the simplifying assumption thatthe runtime of searching for a program of length T with C functions made available to the search isproportional to CT , and consider a Sort and add search for a program of length T , where the sizeof the active set is increased by 1 whenever the search fails. Starting with an active set of size 1, thetotal time until a solution is found can be upper bounded by

1T + 2T + · · ·+ CTA ≤ CT+1

A ≤ CCTA

where CA is the size of the active set when the search finally succeeds (i.e., when the active set finallycontains all necessary functions for a solution to exist). Hence the total runtime of a Sort and addsearch can be upper bounded by a quantity that is proportional to CT

A .

Now fix a valid program solution P that requires CP functions, and let yP ∈ {0, 1}C be the indicatorvector of functions used by P . Let D := CA − CP be the number of redundant operations addedinto the active set until all operations from P have been added.Example 1. Suppose the labels, as sorted by decreasing predicted marginal probabilities f(x), areas follows:

1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Then the solution P contains CP = 6 functions, but the active set needs to grow to size CA = 11to include all of them, adding D = 5 redundant functions along the way. Note that the rank loss ofthe predictions f(x) is Lr(yP , f(x)) = 2 + 5 = 7, as it double counts the two redundant functionswhich are scored higher than two relevant labels.

Noting that in general Lr(yP , f(x)) ≥ D, the previous upper bound on the runtime of Sort and addcan be further upper bounded as follows:

CTA = (CP +D)T ≤ const + const×DT ≤ const + const× Lr(yP , f(x))

T

Hence we see that for a constant value of T , this upper bound can be minimized by optimizing theRank loss of the predictions f(x). Note also that Lr(yP , f(x)) = 0 would imply D = 0, in whichcase CA = CP .

F DOMAIN SPECIFIC LANGUAGE OF DEEPCODER

Here we provide a description of the semantics of our DSL from Sect. 4.1, both in English and as aPython implementation. Throughout, NULL is a special value that can be set e.g. to an integer outsidethe working integer range.

First-order functions:

• HEAD :: [int] -> intlambda xs: xs[0] if len(xs)>0 else NullGiven an array, returns its first element (or NULL if the array is empty).

• LAST :: [int] -> intlambda xs: xs[-1] if len(xs)>0 else NullGiven an array, returns its last element (or NULL if the array is empty).

17

Published as a conference paper at ICLR 2017

• TAKE :: int -> [int] -> intlambda n, xs: xs[:n]Given an integer n and array xs, returns the array truncated after the n-th element. (If thelength of xs was no larger than n in the first place, it is returned without modification.)

• DROP :: int -> [int] -> intlambda n, xs: xs[n:]Given an integer n and array xs, returns the array with the first n elements dropped. (If thelength of xs was no larger than n in the first place, an empty array is returned.)

• ACCESS :: int -> [int] -> intlambda n, xs: xs[n] if n>=0 and len(xs)>n else NullGiven an integer n and array xs, returns the (n+1)-st element of xs. (If the length of xswas less than or equal to n, the value NULL is returned instead.)

• MINIMUM :: [int] -> intlambda xs: min(xs) if len(xs)>0 else NullGiven an array, returns its minimum (or NULL if the array is empty).

• MAXIMUM :: [int] -> intlambda xs: max(xs) if len(xs)>0 else NullGiven an array, returns its maximum (or NULL if the array is empty).

• REVERSE :: [int] -> [int]lambda xs: list(reversed(xs))Given an array, returns its elements in reversed order.

• SORT :: [int] -> [int]lambda xs: sorted(xs)Given an array, return its elements in non-decreasing order.

• SUM :: [int] -> intlambda xs: sum(xs)Given an array, returns the sum of its elements. (The sum of an empty array is 0.)

Higher-order functions:

• MAP :: (int -> int) -> [int] -> [int]lambda f, xs: [f(x) for x in xs]Given a lambda function f mapping from integers to integers, and an array xs, returns thearray resulting from applying f to each element of xs.

• FILTER :: (int -> bool) -> [int] -> [int]lambda f, xs: [x for x in xs if f(x)]Given a predicate f mapping from integers to truth values, and an array xs, returns theelements of xs satisfying the predicate in their original order.

• COUNT :: (int -> bool) -> [int] -> intlambda f, xs: len([x for x in xs if f(x)])Given a predicate f mapping from integers to truth values, and an array xs, returns thenumber of elements in xs satisfying the predicate.

• ZIPWITH :: (int -> int -> int) -> [int] -> [int] -> [int]lambda f, xs, ys: [f(x, y) for (x, y) in zip(xs, ys)]Given a lambda function f mapping integer pairs to integers, and two arrays xs and ys,returns the array resulting from applying f to corresponding elements of xs and ys. Thelength of the returned array is the minimum of the lengths of xs and ys.

• SCANL1 :: (int -> int -> int) -> [int] -> [int]Given a lambda function f mapping integer pairs to integers, and an array xs, returns anarray ys of the same length as xs and with its content defined by the recurrence ys[0] =xs[0], ys[n] = f(ys[n-1], xs[n]) for n ≥ 1.

The INT→INT lambdas (+1), (-1), (*2), (/2), (*(-1)), (**2), (*3), (/3), (*4), (/4)provided by our DSL map integers to integers in a self-explanatory manner. The INT→BOOL lambdas(>0), (<0), (%2==0), (%2==1) respectively test positivity, negativity, evenness and oddness of

18

Published as a conference paper at ICLR 2017

the input integer value. Finally, the INT→INT→INT lambdas (+), (-), (*), MIN, MAX apply afunction to a pair of integers and produce a single integer.

As an example, consider the function SCANL1 MAX, consisting of the higher-order function SCANL1and the INT→INT→INT lambda MAX. Given an integer array a of length L, this function computesthe running maximum of the array a. Specifically, it returns an array b of the same length L whosei-th element is the maximum of the first i elements in a.

HEA

D

LAST

AC

CESS

MIN

IMU

M

MA

XIM

UM

TA

KE

DR

OP

FILT

ER

(>0

)

(<0

)

(%2

==

1)

(%2

==

0)

CO

UN

T

MA

P

MIN

MA

X

+ - * ZIP

WIT

H

SC

AN

L1

SO

RT

REV

ER

SE

(*-1

)

(**2

)

(+1

)

(-1

)

(*2

)

(*3

)

(*4

)

(/2

)

(/3

)

(/4

)

SU

M

HEAD

LAST

ACCESS

MINIMUM

MAXIMUM

TAKE

DROP

FILTER

(>0)

(<0)

(%2==1)

(%2==0)

COUNT

MAP

MIN

MAX

+

-

*

ZIPWITH

SCANL1

SORT

REVERSE

(*-1)

(**2)

(+1)

(-1)

(*2)

(*3)

(*4)

(/2)

(/3)

(/4)

SUM

.17(15)

.05(12)

.29(14)

.09(15)

.04(9)

.06(14)

.12(12)

.06(14)

.08(15)

.07(13)

.06(15)

.06(15)

.16(9)

.11(15)

.03(13)

.06(14)

.00(13)

.09(12)

.05(7)

.05(15)

.03(15)

.03(14)

.01(14)

.01(13)

.02(15)

.03(14)

.02(15)

.00(14)

.06(15)

.01(14)

.02(14)

.08(15)

.00(15)

.34(6)

.15(6)

.01(6)

.05(6)

.07(4)

.04(4)

.09(6)

.04(6)

.01(6)

.02(6)

.11(6)

.04(6)

.01(2)

.04(5)

.11(6)

.05(5)

.11(6)

.01(5)

.03(3)

.06(6)

.01(6)

.03(6)

.03(5)

.01(6)

.03(6)

.01(5)

.02(5)

.00(5)

.02(6)

.13(6)

.00(6)

.01(6)

.02(6)

.10(15)

.19(18)

.15(16)

.05(18)

.10(16)

.18(16)

.16(14)

.03(14)

.09(17)

.13(16)

.12(16)

.03(13)

.14(13)

.18(17)

.06(16)

.05(17)

.00(16)

.02(17)

.10(14)

.07(15)

.04(15)

.07(17)

.02(16)

.01(17)

.04(17)

.03(17)

.02(18)

.00(18)

.03(18)

.02(18)

.02(17)

.03(18)

.02(18)

.16(8)

.22(9)

.04(7)

.12(9)

.12(7)

.08(8)

.24(8)

.03(8)

.16(9)

.13(9)

.13(9)

.13(9)

.06(7)

.34(9)

.02(8)

.25(8)

.10(5)

.00(8)

.01(3)

.05(8)

.03(7)

.06(9)

.02(8)

.00(9)

.02(9)

.03(9)

.04(9)

.00(9)

.00(9)

.01(9)

.04(8)

.02(9)

.02(9)

.25(10)

.23(10)

.22(10)

.05(10)

.08(7)

.02(6)

.10(9)

.09(9)

.04(10)

.08(10)

.06(10)

.10(10)

.05(7)

.08(8)

.04(6)

.09(9)

.08(8)

.00(10)

.03(5)

.02(7)

.04(10)

.09(10)

.01(10)

.00(10)

.02(10)

.02(10)

.02(10)

.00(10)

.00(9)

.07(10)

.01(9)

.00(9)

.00(10)

.05(36)

.06(40)

.04(40)

.06(40)

.02(39)

.05(39)

.12(36)

.06(40)

.05(42)

.09(40)

.02(36)

.01(38)

.05(21)

.06(40)

.05(37)

.03(34)

.00(37)

.02(32)

.11(20)

.02(35)

.03(41)

.04(40)

.00(41)

.03(40)

.01(37)

.04(39)

.02(39)

.00(42)

.02(41)

.03(39)

.01(39)

.00(41)

.00(42)

.04(44)

.03(43)

.03(43)

.03(44)

.03(41)

.11(42)

.09(36)

.06(42)

.08(44)

.09(38)

.05(38)

.01(35)

.14(29)

.06(39)

.04(40)

.09(39)

.09(39)

.00(38)

.10(21)

.03(40)

.05(44)

.08(43)

.01(44)

.01(45)

.05(44)

.06(40)

.03(43)

.02(41)

.01(43)

.02(44)

.03(42)

.01(44)

.00(45)

.01(115)

.01(118)

.01(114)

.02(117)

.00(117)

.03(112)

.06(109)

.06(90)

.05(87)

.07(88)

.08(81)

.05(110)

.11(86)

.08(93)

.04(91)

.11(89)

.07(92)

.01(89)

.03(22)

.03(102)

.02(113)

.04(110)

.05(115)

.01(117)

.02(114)

.03(116)

.01(114)

.00(111)

.01(112)

.05(116)

.03(115)

.02(114)

.00(116)

.03(33)

.02(34)

.01(30)

.02(33)

.01(33)

.02(32)

.07(31)

.08(6)

.13(33)

.15(32)

.13(31)

.06(23)

.16(25)

.11(26)

.11(29)

.16(30)

.05(25)

.00(29)

.05(11)

.06(30)

.02(32)

.04(32)

.02(33)

.01(34)

.03(34)

.02(32)

.01(33)

.00(32)

.00(31)

.05(32)

.06(34)

.03(34)

.02(34)

.00(33)

.01(33)

.00(32)

.00(33)

.00(33)

.01(33)

.01(32)

.00(2)

.13(32)

.07(32)

.06(33)

.01(31)

.14(27)

.11(24)

.03(27)

.13(16)

.10(24)

.01(26)

.04(3)

.02(24)

.04(33)

.01(31)

.06(32)

.01(32)

.03(32)

.04(33)

.00(32)

.00(32)

.00(32)

.04(33)

.05(33)

.04(32)

.01(32)

.01(36)

.01(38)

.01(36)

.04(38)

.00(38)

.06(36)

.06(31)

.05(8)

.09(36)

.11(37)

.19(37)

.06(26)

.12(29)

.12(32)

.05(30)

.07(27)

.16(34)

.00(29)

.02(10)

.06(32)

.03(35)

.06(34)

.07(37)

.00(38)

.03(38)

.03(38)

.00(38)

.00(36)

.00(37)

.03(37)

.04(35)

.06(37)

.00(38)

.01(45)

.01(45)

.00(43)

.00(45)

.00(45)

.02(39)

.06(38)

.03(8)

.09(42)

.09(45)

.17(44)

.04(33)

.12(33)

.05(40)

.04(32)

.09(40)

.02(33)

.00(33)

.04(11)

.02(42)

.00(44)

.05(43)

.05(45)

.00(45)

.02(42)

.04(44)

.02(43)

.00(43)

.02(43)

.06(45)

.01(44)

.01(43)

.00(44)

.04(32)

.02(32)

.01(27)

.01(32)

.01(32)

.03(28)

.04(22)

.18(24)

.14(21)

.24(30)

.20(20)

.16(20)

.19(28)

.15(29)

.11(27)

.10(24)

.14(25)

.00(28)

.09(14)

.07(26)

.03(31)

.05(30)

.02(32)

.00(32)

.05(32)

.03(31)

.01(32)

.00(32)

.00(31)

.03(31)

.04(31)

.06(32)

.02(32)

.01(246)

.01(248)

.01(247)

.01(250)

.00(249)

.01(231)

.02(236)

.02(220)

.02(243)

.01(246)

.02(243)

.02(240)

.01(248)

.08(186)

.04(203)

.11(178)

.06(188)

.05(193)

.04(40)

.04(206)

.02(246)

.04(235)

.03(225)

.03(231)

.05(213)

.11(224)

.03(228)

.01(214)

.02(217)

.06(220)

.03(218)

.05(225)

.00(250)

.02(123)

.00(122)

.01(122)

.00(123)

.00(121)

.01(121)

.02(117)

.01(98)

.01(115)

.01(114)

.03(117)

.02(118)

.02(120)

.10(57)

.07(107)

.08(90)

.04(92)

.02(102)

.00(6)

.04(76)

.03(117)

.06(113)

.03(111)

.02(118)

.04(114)

.08(119)

.03(118)

.01(115)

.01(112)

.06(117)

.03(116)

.03(116)

.00(122)

.01(128)

.01(130)

.01(128)

.01(129)

.00(126)

.02(125)

.01(125)

.02(103)

.03(125)

.02(124)

.02(122)

.02(117)

.02(125)

.15(81)

.22(114)

.07(83)

.05(99)

.02(110)

.00(5)

.03(78)

.03(120)

.05(124)

.03(125)

.03(124)

.03(126)

.04(130)

.03(125)

.01(124)

.02(122)

.04(120)

.04(124)

.03(122)

.00(127)

.01(175)

.01(175)

.01(175)

.00(175)

.01(175)

.02(168)

.01(170)

.01(147)

.02(172)

.02(159)

.02(165)

.02(171)

.00(168)

.15(102)

.06(143)

.03(129)

.19(138)

.02(136)

.00(4)

.04(120)

.02(169)

.04(169)

.00(169)

.02(171)

.04(157)

.08(171)

.09(171)

.02(169)

.04(166)

.03(166)

.03(169)

.03(170)

.00(174)

.01(152)

.01(154)

.02(152)

.01(150)

.00(152)

.02(149)

.00(148)

.02(128)

.03(145)

.03(145)

.03(150)

.02(142)

.01(147)

.15(90)

.10(123)

.07(123)

.24(116)

.02(119)

.01(7)

.03(93)

.02(147)

.03(146)

.03(149)

.04(149)

.03(143)

.07(147)

.03(149)

.02(144)

.04(146)

.03(147)

.04(144)

.06(149)

.00(154)

.00(139)

.01(141)

.03(141)

.02(141)

.01(142)

.03(132)

.01(135)

.02(113)

.03(137)

.02(135)

.03(133)

.01(130)

.00(138)

.24(83)

.08(121)

.03(122)

.08(102)

.08(107)

.04(116)

.02(135)

.01(131)

.06(138)

.09(139)

.03(134)

.06(133)

.05(138)

.05(131)

.06(136)

.04(132)

.01(138)

.01(137)

.00(138)

.01(423)

.01(428)

.01(427)

.01(425)

.01(426)

.02(409)

.01(407)

.02(335)

.03(408)

.02(401)

.03(403)

.02(397)

.02(413)

.14(219)

.09(314)

.04(306)

.11(259)

.08(284)

.02(289)

.04(327)

.02(404)

.04(400)

.03(410)

.03(414)

.04(397)

.07(411)

.04(413)

.02(399)

.03(404)

.04(402)

.03(406)

.04(406)

.00(424)

.01(125)

.01(125)

.03(122)

.01(124)

.00(122)

.03(118)

.01(120)

.01(109)

.02(121)

.01(116)

.02(119)

.02(122)

.00(119)

.14(79)

.09(78)

.05(73)

.11(69)

.09(64)

.02(99)

.03(21)

.04(121)

.03(118)

.03(120)

.03(119)

.02(119)

.05(123)

.02(122)

.03(123)

.06(118)

.01(118)

.03(119)

.04(121)

.00(123)

.02(33)

.05(33)

.00(30)

.01(31)

.01(33)

.04(32)

.07(32)

.05(28)

.01(31)

.04(33)

.03(30)

.01(32)

.06(32)

.17(27)

.18(27)

.02(23)

.09(26)

.06(26)

.00(26)

.02(6)

.09(29)

.09(31)

.06(32)

.04(33)

.03(31)

.07(33)

.03(33)

.02(33)

.01(33)

.02(32)

.03(33)

.01(31)

.00(31)

.00(39)

.01(40)

.00(39)

.00(40)

.00(40)

.01(38)

.09(38)

.02(32)

.02(38)

.01(38)

.02(36)

.04(38)

.06(38)

.21(23)

.16(30)

.05(34)

.01(33)

.02(32)

.03(29)

.06(9)

.08(33)

.08(38)

.09(39)

.01(38)

.06(37)

.11(39)

.01(39)

.00(37)

.00(39)

.04(40)

.03(37)

.03(37)

.00(40)

.03(26)

.04(26)

.01(25)

.01(26)

.00(27)

.02(26)

.03(26)

.05(24)

.03(26)

.03(26)

.04(26)

.03(27)

.02(27)

.16(15)

.10(22)

.06(20)

.09(22)

.02(23)

.05(6)

.06(22)

.02(26)

.13(26)

.02(27)

.05(24)

.10(27)

.02(26)

.00(25)

.03(26)

.05(26)

.02(27)

.03(27)

.00(27)

.00(19)

.01(21)

.00(20)

.03(21)

.01(21)

.01(19)

.07(21)

.01(20)

.04(21)

.00(20)

.00(21)

.01(21)

.00(21)

.12(16)

.02(15)

.06(16)

.04(16)

.39(18)

.09(4)

.05(15)

.00(21)

.02(19)

.01(21)

.04(20)

.05(18)

.01(20)

.05(19)

.07(21)

.04(21)

.01(19)

.02(21)

.00(21)

.00(39)

.01(39)

.00(38)

.00(39)

.00(39)

.01(34)

.04(38)

.01(35)

.01(39)

.00(38)

.04(39)

.01(36)

.01(39)

.08(30)

.03(35)

.09(20)

.02(28)

.03(31)

.06(5)

.03(33)

.01(37)

.07(36)

.03(36)

.02(38)

.43(39)

.04(39)

.01(39)

.00(39)

.06(38)

.02(39)

.03(38)

.00(39)

.01(27)

.01(27)

.01(27)

.01(28)

.02(28)

.02(25)

.00(23)

.02(26)

.04(26)

.03(28)

.02(28)

.01(27)

.01(27)

.04(24)

.07(28)

.12(23)

.01(21)

.05(19)

.08(8)

.03(26)

.02(28)

.05(27)

.01(28)

.07(25)

.23(28)

.04(28)

.00(23)

.02(27)

.08(28)

.03(24)

.04(27)

.00(28)

.01(24)

.00(23)

.00(24)

.00(24)

.01(24)

.00(21)

.00(22)

.01(20)

.01(23)

.00(23)

.00(24)

.03(22)

.00(24)

.02(19)

.01(19)

.39(19)

.03(19)

.00(20)

.10(6)

.05(21)

.07(24)

.02(23)

.01(23)

.00(23)

.01(24)

.03(24)

.01(23)

.06(21)

.03(21)

.06(24)

.02(21)

.00(24)

.01(37)

.00(37)

.00(38)

.00(38)

.00(38)

.00(38)

.00(34)

.00(31)

.00(36)

.00(37)

.00(36)

.00(36)

.00(38)

.02(30)

.02(32)

.14(31)

.02(28)

.08(27)

.02(6)

.06(36)

.01(38)

.06(35)

.02(36)

.04(36)

.02(38)

.02(33)

.04(37)

.05(36)

.06(37)

.03(36)

.03(36)

.00(38)

.01(35)

.00(35)

.03(35)

.00(35)

.00(34)

.02(34)

.00(33)

.00(29)

.01(32)

.00(34)

.00(34)

.04(33)

.00(34)

.02(24)

.00(27)

.11(25)

.16(27)

.05(29)

.06(8)

.06(28)

.04(35)

.05(34)

.02(34)

.02(35)

.03(35)

.04(34)

.06(32)

.06(33)

.05(35)

.00(32)

.00(35)

.00(34)

.00(31)

.00(32)

.00(32)

.00(32)

.00(32)

.00(29)

.00(31)

.04(30)

.01(30)

.01(32)

.00(31)

.01(32)

.01(31)

.16(26)

.03(22)

.08(22)

.07(25)

.00(22)

.03(3)

.07(25)

.03(31)

.02(32)

.04(31)

.03(32)

.08(31)

.13(32)

.05(29)

.00(31)

.01(32)

.10(30)

.13(30)

.00(32)

.02(33)

.02(34)

.03(33)

.01(33)

.00(33)

.00(31)

.01(31)

.07(31)

.03(34)

.02(34)

.01(31)

.04(33)

.01(33)

.07(27)

.04(28)

.11(27)

.04(24)

.06(30)

.06(9)

.02(28)

.01(34)

.01(31)

.06(34)

.02(32)

.10(34)

.08(30)

.02(34)

.00(32)

.01(31)

.12(32)

.15(34)

.00(33)

.02(27)

.02(27)

.00(27)

.00(27)

.00(26)

.01(26)

.00(26)

.01(23)

.02(27)

.00(26)

.00(26)

.03(25)

.00(27)

.07(20)

.03(19)

.09(21)

.10(22)

.00(22)

.03(2)

.02(23)

.03(25)

.02(24)

.01(27)

.01(27)

.02(26)

.02(26)

.01(24)

.00(25)

.01(27)

.13(25)

.17(27)

.00(27)

.01(8)

.07(8)

.13(8)

.06(8)

.01(8)

.10(8)

.00(8)

.15(6)

.07(8)

.04(7)

.06(8)

.03(7)

.00(8)

.31(6)

.09(7)

.02(5)

.11(6)

.18(8)

.02(4)

.06(1)

.18(6)

.01(6)

.00(8)

.07(8)

.09(8)

.12(8)

.09(8)

.06(8)

.14(8)

.00(7)

.02(8)

.05(7)

.11(8)

Figure 9: Conditional confusion matrix for the neural network and test set of P = 500 programs oflength T = 3 that were used to obtain the results presented in Table 1. Each cell contains the averagefalse positive probability (in larger font) and the number of test programs from which this averagewas computed (smaller font, in brackets). The color intensity of each cell’s shading coresponds tothe magnitude of the average false positive probability.

G ANALYSIS OF TRAINED NEURAL NETWORKS

We analyzed the performance of trained neural networks by investigating which program instructionstend to get confused by the networks. To this end, we looked at a generalization of confusion matricesto the multilabel classification setting: for each attribute in a ground truth program (rows) measurehow likely each other attribute (columns) is predicted as a false positive. More formally, in thismatrix the (i, j)-entry is the average predicted probability of attribute j among test programs that do

19

Published as a conference paper at ICLR 2017

possess attribute i and do not possess attribute j. Intuitively, the i-th row of this matrix shows howthe presence of attribute i confuses the network into incorrectly predicting each other attribute j.

Figure 9 shows this conditional confusion matrix for the neural network and P = 500 program testset configuration used to obtain Table 1. We re-ordered the confusion matrix to try to expose blockstructure in the false positive probabilities, revealing groups of instructions that tend to be difficult todistinguish. Figure 10 show the conditional confusion matrix for the neural network used to obtainthe table in Fig. 3a. While the results are somewhat noisy, we observe a few general tendencies:

• There is increased confusion amongst instructions that select out a single element from anarray: HEAD, LAST, ACCESS, MINIMUM, MAXIMUM.• Some common attributes get predicted more often regardless of the ground truth program:

FILTER, (>0), (<0), (%2==1), (%2==0), MIN, MAX, (+), (-), ZIPWITH.• There are some groups of lambdas that are more difficult for the network to distinguish

within: (+) vs (-); (+1) vs (-1); (/2) vs (/3) vs (/4).• When a program uses (**2), the network often thinks it’s using (*), presumably because

both can lead to large values in the output.

HEA

D

LAST

AC

CESS

MIN

IMU

M

MA

XIM

UM

TA

KE

DR

OP

FILT

ER

(>0

)

(<0

)

(%2

==

1)

(%2

==

0)

CO

UN

T

MA

P

MIN

MA

X

+ - * ZIP

WIT

H

SC

AN

L1

SO

RT

REV

ER

SE

(*-1

)

(**2

)

(+1

)

(-1

)

(*2

)

(*3

)

(*4

)

(/2

)

(/3

)

(/4

)

SU

M

HEAD

LAST

ACCESS

MINIMUM

MAXIMUM

TAKE

DROP

FILTER

(>0)

(<0)

(%2==1)

(%2==0)

COUNT

MAP

MIN

MAX

+

-

*

ZIPWITH

SCANL1

SORT

REVERSE

(*-1)

(**2)

(+1)

(-1)

(*2)

(*3)

(*4)

(/2)

(/3)

(/4)

SUM

.24(39)

.15(33)

.12(41)

.16(39)

.12(24)

.09(19)

.12(26)

.06(33)

.04(34)

.08(38)

.09(37)

.07(36)

.06(12)

.10(30)

.09(34)

.06(33)

.07(33)

.06(37)

.18(21)

.07(25)

.04(37)

.06(39)

.04(37)

.01(38)

.08(38)

.02(36)

.04(41)

.05(41)

.00(39)

.02(32)

.06(34)

.07(35)

.01(41)

.14(44)

.29(38)

.12(44)

.17(40)

.09(24)

.10(28)

.12(27)

.07(38)

.11(38)

.12(44)

.13(37)

.16(39)

.09(16)

.12(36)

.14(39)

.05(38)

.09(40)

.00(38)

.19(25)

.07(31)

.04(42)

.05(41)

.02(42)

.02(46)

.06(39)

.02(44)

.03(42)

.05(44)

.02(42)

.02(44)

.04(40)

.04(41)

.02(45)

.14(106)

.26(106)

.08(91)

.16(96)

.14(83)

.17(94)

.17(77)

.13(84)

.11(88)

.14(96)

.14(89)

.10(55)

.10(42)

.13(88)

.12(90)

.06(92)

.08(97)

.04(110)

.18(66)

.08(74)

.05(102)

.06(97)

.04(108)

.01(104)

.05(99)

.05(107)

.02(106)

.04(108)

.01(111)

.03(102)

.06(101)

.05(106)

.03(113)

.19(60)

.24(58)

.12(37)

.15(59)

.09(34)

.13(40)

.16(39)

.06(50)

.11(49)

.09(55)

.13(52)

.10(48)

.09(19)

.13(50)

.10(50)

.06(50)

.06(51)

.07(52)

.20(36)

.10(43)

.04(54)

.05(55)

.05(56)

.04(53)

.03(54)

.05(53)

.02(58)

.10(53)

.01(57)

.03(49)

.04(49)

.06(57)

.03(59)

.16(48)

.26(44)

.18(32)

.14(49)

.18(32)

.10(27)

.10(34)

.09(43)

.09(46)

.09(46)

.08(44)

.14(45)

.10(18)

.09(34)

.12(34)

.05(43)

.08(44)

.03(41)

.10(17)

.10(35)

.05(44)

.08(48)

.03(47)

.02(45)

.03(41)

.05(50)

.02(50)

.04(44)

.02(50)

.02(43)

.04(47)

.05(45)

.01(50)

.09(128)

.11(123)

.10(114)

.06(119)

.07(127)

.17(132)

.22(94)

.09(111)

.11(108)

.08(116)

.12(116)

.04(70)

.06(43)

.14(123)

.09(107)

.09(118)

.07(120)

.05(124)

.19(73)

.06(95)

.04(126)

.07(132)

.04(131)

.02(134)

.03(126)

.05(139)

.03(133)

.04(135)

.01(138)

.03(129)

.04(125)

.04(127)

.00(143)

.05(131)

.11(135)

.09(133)

.06(133)

.05(130)

.16(140)

.22(98)

.11(109)

.14(117)

.14(125)

.13(124)

.03(69)

.07(46)

.12(118)

.13(120)

.08(112)

.11(129)

.05(129)

.15(68)

.09(99)

.05(137)

.12(143)

.05(143)

.02(139)

.04(136)

.03(140)

.02(145)

.04(138)

.01(141)

.02(134)

.06(138)

.05(136)

.01(150)

.05(200)

.09(196)

.07(178)

.05(194)

.05(199)

.09(164)

.09(160)

.08(130)

.11(144)

.10(142)

.11(147)

.07(150)

.08(73)

.12(156)

.11(155)

.08(150)

.12(166)

.04(170)

.11(69)

.07(144)

.04(203)

.08(194)

.04(197)

.03(203)

.06(195)

.04(192)

.03(204)

.03(191)

.01(205)

.04(194)

.06(192)

.04(185)

.01(213)

.04(124)

.11(124)

.06(102)

.04(122)

.05(125)

.08(98)

.08(88)

.15(47)

.21(125)

.16(111)

.15(112)

.05(51)

.06(38)

.16(102)

.14(90)

.09(93)

.11(105)

.04(113)

.15(50)

.09(82)

.05(126)

.10(120)

.05(117)

.02(125)

.05(117)

.04(116)

.02(122)

.03(117)

.01(123)

.04(120)

.06(118)

.05(118)

.01(131)

.05(101)

.07(100)

.08(82)

.07(97)

.06(104)

.09(71)

.10(72)

.14(37)

.19(101)

.10(84)

.16(93)

.03(40)

.08(40)

.16(80)

.12(81)

.09(85)

.10(88)

.03(91)

.15(46)

.06(70)

.05(95)

.08(93)

.06(103)

.02(105)

.05(93)

.06(101)

.03(99)

.03(97)

.01(104)

.04(96)

.04(98)

.05(98)

.01(106)

.03(91)

.06(92)

.06(76)

.04(89)

.04(90)

.08(65)

.08(66)

.11(21)

.14(73)

.12(70)

.20(90)

.04(37)

.06(34)

.15(73)

.13(69)

.10(56)

.14(69)

.05(71)

.10(28)

.09(63)

.05(87)

.09(85)

.05(88)

.03(82)

.04(92)

.03(88)

.03(92)

.02(88)

.00(89)

.04(81)

.04(87)

.04(81)

.02(92)

.06(93)

.10(88)

.05(72)

.03(89)

.07(91)

.06(68)

.10(68)

.13(29)

.12(77)

.16(82)

.23(93)

.07(45)

.09(33)

.10(72)

.13(71)

.09(74)

.09(75)

.03(79)

.12(35)

.05(59)

.05(87)

.09(85)

.04(91)

.02(93)

.06(91)

.04(83)

.03(92)

.03(87)

.01(92)

.04(90)

.07(83)

.04(85)

.01(97)

.04(190)

.09(188)

.07(136)

.04(183)

.06(190)

.05(120)

.08(111)

.29(130)

.14(114)

.16(127)

.17(138)

.16(143)

.07(64)

.16(154)

.15(143)

.10(143)

.11(160)

.04(167)

.16(83)

.08(118)

.05(171)

.10(170)

.06(185)

.02(182)

.04(176)

.05(181)

.02(180)

.02(179)

.01(181)

.04(175)

.05(174)

.05(177)

.02(192)

.06(345)

.08(344)

.05(302)

.04(333)

.06(342)

.06(272)

.08(267)

.15(232)

.07(280)

.10(306)

.10(314)

.11(310)

.05(243)

.13(275)

.11(266)

.10(278)

.11(295)

.07(302)

.12(127)

.07(251)

.04(340)

.08(328)

.05(327)

.03(327)

.05(310)

.05(329)

.03(336)

.04(325)

.02(337)

.05(305)

.07(303)

.05(299)

.01(368)

.05(131)

.08(132)

.06(116)

.06(132)

.04(126)

.11(120)

.06(107)

.10(83)

.07(112)

.08(114)

.10(121)

.11(117)

.08(101)

.06(43)

.15(112)

.10(103)

.11(112)

.05(114)

.05(19)

.04(58)

.05(131)

.08(122)

.05(133)

.02(131)

.05(126)

.06(132)

.03(130)

.02(131)

.01(132)

.04(123)

.07(129)

.05(117)

.01(141)

.04(147)

.07(147)

.04(130)

.02(144)

.04(138)

.06(116)

.07(121)

.15(94)

.08(112)

.10(127)

.09(129)

.10(128)

.06(102)

.07(46)

.18(124)

.09(112)

.10(127)

.05(125)

.06(28)

.04(63)

.05(141)

.08(138)

.04(139)

.03(145)

.07(138)

.05(143)

.03(142)

.02(138)

.01(138)

.04(133)

.05(132)

.05(135)

.01(154)

.04(136)

.08(136)

.03(122)

.04(134)

.03(137)

.06(117)

.05(103)

.14(79)

.09(105)

.12(121)

.09(106)

.10(121)

.04(92)

.08(48)

.11(105)

.11(102)

.30(121)

.07(124)

.04(21)

.08(69)

.04(130)

.08(132)

.02(135)

.03(135)

.03(131)

.05(134)

.05(135)

.04(131)

.02(127)

.05(129)

.06(126)

.05(126)

.02(142)

.04(104)

.10(106)

.01(95)

.03(103)

.04(106)

.03(87)

.07(88)

.11(63)

.07(85)

.08(92)

.08(87)

.08(90)

.05(77)

.06(33)

.13(82)

.13(85)

.28(89)

.07(95)

.04(17)

.06(54)

.04(98)

.07(95)

.06(103)

.03(96)

.05(95)

.04(103)

.04(109)

.04(110)

.01(109)

.04(95)

.07(96)

.05(94)

.02(111)

.03(92)

.05(88)

.04(92)

.03(88)

.04(87)

.05(75)

.03(72)

.12(51)

.08(77)

.05(79)

.11(73)

.07(78)

.03(68)

.07(24)

.09(68)

.10(67)

.10(76)

.11(79)

.07(65)

.03(90)

.08(87)

.05(86)

.11(88)

.03(87)

.04(88)

.03(90)

.03(87)

.03(90)

.04(80)

.07(81)

.04(81)

.02(96)

.05(318)

.08(317)

.04(290)

.04(314)

.04(305)

.06(266)

.06(253)

.13(192)

.08(256)

.09(276)

.09(272)

.09(276)

.05(226)

.06(91)

.11(215)

.10(212)

.11(215)

.13(243)

.06(242)

.07(219)

.03(303)

.09(297)

.04(310)

.04(305)

.05(301)

.05(310)

.04(308)

.03(304)

.01(310)

.04(287)

.06(294)

.04(283)

.01(334)

.04(175)

.09(176)

.05(151)

.05(174)

.05(176)

.09(141)

.07(137)

.15(120)

.09(141)

.12(153)

.10(160)

.11(153)

.07(114)

.08(68)

.10(107)

.10(100)

.10(116)

.13(133)

.05(160)

.13(72)

.06(175)

.08(174)

.05(173)

.02(180)

.04(168)

.04(180)

.02(180)

.03(184)

.01(175)

.03(169)

.05(168)

.05(168)

.02(191)

.05(52)

.09(52)

.06(44)

.02(50)

.03(50)

.03(37)

.09(40)

.22(44)

.12(50)

.08(43)

.16(49)

.10(46)

.04(32)

.11(22)

.14(45)

.12(43)

.10(42)

.09(42)

.07(50)

.10(21)

.10(40)

.13(51)

.07(53)

.02(51)

.04(51)

.05(54)

.03(52)

.02(52)

.00(51)

.04(51)

.05(51)

.04(50)

.00(56)

.05(62)

.07(59)

.02(47)

.02(59)

.06(62)

.05(51)

.11(54)

.15(43)

.09(52)

.06(49)

.10(55)

.11(52)

.05(39)

.08(18)

.15(44)

.14(48)

.12(52)

.08(47)

.07(55)

.15(23)

.10(47)

.10(59)

.06(58)

.02(57)

.05(57)

.10(61)

.04(61)

.05(59)

.02(60)

.07(57)

.07(51)

.05(55)

.00(64)

.05(43)

.06(43)

.03(41)

.03(43)

.05(44)

.03(33)

.07(37)

.12(29)

.06(32)

.15(42)

.06(41)

.06(41)

.04(37)

.18(38)

.09(32)

.11(38)

.11(38)

.04(37)

.13(19)

.05(29)

.06(44)

.12(41)

.03(45)

.06(45)

.03(39)

.01(39)

.03(42)

.01(43)

.05(42)

.08(45)

.03(42)

.03(47)

.07(44)

.09(47)

.03(37)

.04(40)

.04(42)

.04(36)

.03(33)

.18(35)

.09(40)

.10(44)

.06(35)

.09(43)

.03(34)

.13(36)

.12(38)

.14(38)

.07(31)

.43(39)

.18(14)

.11(36)

.02(42)

.06(40)

.07(45)

.02(40)

.05(43)

.02(47)

.13(45)

.03(46)

.03(38)

.07(43)

.05(40)

.01(46)

.05(61)

.09(57)

.06(49)

.04(58)

.06(55)

.05(45)

.09(47)

.13(44)

.05(49)

.07(49)

.16(62)

.10(58)

.07(45)

.12(48)

.15(48)

.08(51)

.06(47)

.08(55)

.14(27)

.10(41)

.06(59)

.08(57)

.07(62)

.04(57)

.10(64)

.02(62)

.03(61)

.02(58)

.04(59)

.06(58)

.08(53)

.01(62)

.07(40)

.08(43)

.04(38)

.03(38)

.08(45)

.10(39)

.08(32)

.10(22)

.04(29)

.15(38)

.12(39)

.03(31)

.06(31)

.13(35)

.11(34)

.08(35)

.11(36)

.06(37)

.10(17)

.06(34)

.06(43)

.07(42)

.04(37)

.03(41)

.16(45)

.01(40)

.03(40)

.02(44)

.09(43)

.07(42)

.05(38)

.00(45)

.03(38)

.06(34)

.04(30)

.02(36)

.05(38)

.05(26)

.10(30)

.16(27)

.04(28)

.11(29)

.05(36)

.06(33)

.01(23)

.11(26)

.07(26)

.25(29)

.19(35)

.03(32)

.08(8)

.04(27)

.03(34)

.13(35)

.02(30)

.03(38)

.07(36)

.03(33)

.02(33)

.03(36)

.07(35)

.07(35)

.01(33)

.01(37)

.04(49)

.08(47)

.04(43)

.01(42)

.04(43)

.05(39)

.05(34)

.10(25)

.04(34)

.09(38)

.12(43)

.07(39)

.03(33)

.09(38)

.10(33)

.04(36)

.11(47)

.04(40)

.12(15)

.06(42)

.04(45)

.05(44)

.04(44)

.02(47)

.04(46)

.04(44)

.05(44)

.03(48)

.05(44)

.05(45)

.02(38)

.01(49)

.04(35)

.05(33)

.05(34)

.01(34)

.03(37)

.01(30)

.05(25)

.20(27)

.05(28)

.14(33)

.10(32)

.17(32)

.03(23)

.10(27)

.05(21)

.07(20)

.15(34)

.13(31)

.06(9)

.06(21)

.03(32)

.10(33)

.05(33)

.05(36)

.04(31)

.04(36)

.07(35)

.07(36)

.04(35)

.05(33)

.03(34)

.02(37)

.04(60)

.10(67)

.05(57)

.03(58)

.06(62)

.06(53)

.07(50)

.14(48)

.09(57)

.09(57)

.09(56)

.11(62)

.08(49)

.10(50)

.13(48)

.13(54)

.11(52)

.05(53)

.08(18)

.06(47)

.04(64)

.08(62)

.05(64)

.03(60)

.04(64)

.07(67)

.03(66)

.03(64)

.02(67)

.11(64)

.08(63)

.01(69)

.05(64)

.08(65)

.03(58)

.03(60)

.05(68)

.05(51)

.08(56)

.17(48)

.09(57)

.09(61)

.07(64)

.10(57)

.08(50)

.16(58)

.13(49)

.07(53)

.13(55)

.04(56)

.12(27)

.06(48)

.04(66)

.06(58)

.07(69)

.03(67)

.05(65)

.05(68)

.03(68)

.04(67)

.02(67)

.06(66)

.09(63)

.01(70)

.05(69)

.07(70)

.04(67)

.04(72)

.03(70)

.04(57)

.08(58)

.12(45)

.06(61)

.11(65)

.07(62)

.09(63)

.03(57)

.12(50)

.09(56)

.10(57)

.14(57)

.05(60)

.11(20)

.06(52)

.03(69)

.07(66)

.07(70)

.03(68)

.07(64)

.07(68)

.02(70)

.01(64)

.00(72)

.09(69)

.16(67)

.00(74)

.07(6)

.09(5)

.27(5)

.20(5)

.14(6)

.07(4)

.05(3)

.09(4)

.06(5)

.15(4)

.16(4)

.14(6)

.14(3)

.18(5)

.22(6)

.30(4)

.28(5)

.18(6)

.11(2)

.59(6)

.03(6)

.02(6)

.04(6)

.02(5)

.01(4)

.06(6)

.01(5)

.13(6)

.00(6)

.02(6)

.05(5)

.02(5)

Figure 10: Conditional confusion matrix for the neural network and test set of P = 500 programs oflength T = 5. The presentation is the same as in Figure 9.

20