Weakly-Supervised Deep Learning of Heat Transport via ...

9
Weakly-Supervised Deep Learning of Heat Transport via Physics Informed Loss Rishi Sharma Electrical Engineering, Stanford University [email protected] Amir Barati Farimani Bioengineering, Stanford University Joe Gomes Bioengineering, Stanford University Peter Eastman Bioengineering, Stanford University Vijay Pande Bioengineering, Stanford University [email protected] Abstract In typical machine learning tasks and appli- cations, it is necessary to obtain or create large labeled datasets in order to to achieve high performance. Unfortunately, large la- beled datasets are not always available and can be expensive to source, creating a bot- tleneck towards more widely applicable ma- chine learning. The paradigm of weak super- vision offers an alternative that allows for inte- gration of domain-specific knowledge by en- forcing constraints that a correct solution to the learning problem will obey over the output space. In this work, we explore the applica- tion of this paradigm to 2-D physical systems governed by non-linear differential equations. We demonstrate that knowledge of the partial differential equations governing a system can be encoded into the loss function of a neural network via an appropriately chosen convolu- tional kernel. We demonstrate this by showing that the steady-state solution to the 2-D heat equation can be learned directly from initial conditions by a convolutional neural network, in the absence of labeled training data. We also extend recent work in the progressive growing of fully convolutional networks to achieve high accuracy (< 1.5% error) at multiple scales of the heat-flow problem, including at the very large scale (1024×1024). Finally, we demon- strate that this method can be used to speed up exact calculation of the solution to the differ- ential equations via finite difference. 1 Introduction Most of the dramatic successes of machine learning in the 21st century have utilized very large datasets in order to achieve their performance 1 . In supervised tasks, in- cluding in image recognition [RDS + 15], speech recogni- tion [GrMH13] and machine translation [SVL14], large datasets had to be assembled before neural network ar- chitectures particularly suited to these applications could emerge and achieve human-level performance on these tasks. In reinforcement learning [SB98], successful agents, including Go champion AlphaZero [SSS + 17] and Atari-playing DQNs [MKS + 15], operate in easily- simulated toy environments that enable the collection of large quantities of data in the form of observations and interactions with the environment. The paradigm of weakly-supervised learning [RBVR17] seeks to reduce the data requirements by encoding our prior knowledge into machine learning systems. In this work, we explore the ability to encode our prior knowl- edge about the physical world into an appropriate loss function to guide deep learning. We are interested in inference within physical environ- ments whose rules can be defined by differential equa- tions. We focus on the case of 2-dimensional heat trans- port [Fou07], whose dynamics are defined by a second- order differential equation. We seek to encode the dy- namics of the system into a loss function, such that the learning algorithm can learn to produce correct solutions to the future state of the system without having to ob- serve any labeled data. We develop a convolutional ker- nel which encodes the constraint that must be satisfied by any steady-state solution to the heat flow problem, and we use this kernel to determine the loss function. By seeking to minimize this loss, the network learns to satisfy the differential equations of heat transport, effec- tively learning the underlying physics of the system de- spite never explicitly being shown the outcomes for any given initial condition. 1 https://www.edge.org/response-detail/26587 argues that the creation of high-quality datasets is a better predictor of progress in AI than algorithmic advancement. arXiv:1807.11374v2 [stat.ML] 21 Aug 2018

Transcript of Weakly-Supervised Deep Learning of Heat Transport via ...

Page 1: Weakly-Supervised Deep Learning of Heat Transport via ...

Weakly-Supervised Deep Learning ofHeat Transport via Physics Informed Loss

Rishi SharmaElectrical Engineering,

Stanford [email protected]

Amir Barati FarimaniBioengineering,

Stanford University

Joe GomesBioengineering,

Stanford University

Peter EastmanBioengineering,

Stanford University

Vijay PandeBioengineering,

Stanford [email protected]

Abstract

In typical machine learning tasks and appli-cations, it is necessary to obtain or createlarge labeled datasets in order to to achievehigh performance. Unfortunately, large la-beled datasets are not always available andcan be expensive to source, creating a bot-tleneck towards more widely applicable ma-chine learning. The paradigm of weak super-vision offers an alternative that allows for inte-gration of domain-specific knowledge by en-forcing constraints that a correct solution tothe learning problem will obey over the outputspace. In this work, we explore the applica-tion of this paradigm to 2-D physical systemsgoverned by non-linear differential equations.We demonstrate that knowledge of the partialdifferential equations governing a system canbe encoded into the loss function of a neuralnetwork via an appropriately chosen convolu-tional kernel. We demonstrate this by showingthat the steady-state solution to the 2-D heatequation can be learned directly from initialconditions by a convolutional neural network,in the absence of labeled training data. We alsoextend recent work in the progressive growingof fully convolutional networks to achieve highaccuracy (< 1.5% error) at multiple scales ofthe heat-flow problem, including at the verylarge scale (1024×1024). Finally, we demon-strate that this method can be used to speed upexact calculation of the solution to the differ-ential equations via finite difference.

1 Introduction

Most of the dramatic successes of machine learning inthe 21st century have utilized very large datasets in order

to achieve their performance 1. In supervised tasks, in-cluding in image recognition [RDS+15], speech recogni-tion [GrMH13] and machine translation [SVL14], largedatasets had to be assembled before neural network ar-chitectures particularly suited to these applications couldemerge and achieve human-level performance on thesetasks. In reinforcement learning [SB98], successfulagents, including Go champion AlphaZero [SSS+17]and Atari-playing DQNs [MKS+15], operate in easily-simulated toy environments that enable the collection oflarge quantities of data in the form of observations andinteractions with the environment.

The paradigm of weakly-supervised learning [RBVR17]seeks to reduce the data requirements by encoding ourprior knowledge into machine learning systems. In thiswork, we explore the ability to encode our prior knowl-edge about the physical world into an appropriate lossfunction to guide deep learning.

We are interested in inference within physical environ-ments whose rules can be defined by differential equa-tions. We focus on the case of 2-dimensional heat trans-port [Fou07], whose dynamics are defined by a second-order differential equation. We seek to encode the dy-namics of the system into a loss function, such that thelearning algorithm can learn to produce correct solutionsto the future state of the system without having to ob-serve any labeled data. We develop a convolutional ker-nel which encodes the constraint that must be satisfiedby any steady-state solution to the heat flow problem,and we use this kernel to determine the loss function.By seeking to minimize this loss, the network learns tosatisfy the differential equations of heat transport, effec-tively learning the underlying physics of the system de-spite never explicitly being shown the outcomes for anygiven initial condition.

1 https://www.edge.org/response-detail/26587 argues thatthe creation of high-quality datasets is a better predictor ofprogress in AI than algorithmic advancement.

arX

iv:1

807.

1137

4v2

[st

at.M

L]

21

Aug

201

8

Page 2: Weakly-Supervised Deep Learning of Heat Transport via ...

While the physical system examined throughout this pa-per is 2-D heat transport, the methods developed are ex-tremely general and can be applied to any system definedby partial differential equations and theoretically capableof being solved by the finite difference method (even if itis not practical to do so).

This points us towards to possibility of encoding theequations we have discovered, which govern many phys-ical environments of interest to us, into neural networks.It would be quite convenient if we could encode this in-formation directly in our learning agents, such that theycould benefit from the knowledge we’ve already gainedabout how the world works.

2 Related Work

This work exists at the intersection of two different linesof research pursued by disparate research communities.The first line of work, weakly supervised machine learn-ing, is pursued primarily within the machine learningcommunity and seeks to reduce the data requirements ofmachine learning applications. It aims to do so by incor-porating some form of prior knowledge, either to aug-ment existing data, or to create context-aware learning al-gorithms that are able to achieve high performance withless data. The second line of work involves the use ofphysics informed machine learning for modeling physi-cal systems, and is pursued primarily within the mathe-matical physics and engineering communities.

[RBVR17] defines weak supervision as a unified ap-proach to incorporating various types of weak signalinto the machine learning pipeline. These forms ofweak signal include crowdsourced [WfWB+09], noisy[NDRT13], or sparse labels (as in active and semi-supervised learning) [ZG09, Set12]. Additionally, weaksignal can be specified in the form of constraints and in-variances over the output space (often provided by do-main experts), or in terms of weak or biased classifiers(as in transfer learning or boosting). Although thesetechniques and approaches are disparate, they are uni-fied by their aim to alleviate the need for vast quantitiesof data to solve machine learning problems.

The present work is most closely connected to work thataims to incorporate domain-specific prior knowledge inthe form of constraints over the output space. The near-est cousin to this work is [SE16], in which physicalconstraints are specified over the output trajectories thatmust be satisfied by solutions for problems in motion-tracking, enabling it to be done in a label-free way. Weextend this work to the broader domain of predictionin physical systems governed by non-linear PDEs. Innatural language processing, [LJK13, AZ11] seek to se-

mantically parse statements or questions (i.e. convertthem into their logical forms) with weak supervision sig-nals (e.g. in the form of responses to queries ratherthan the meaning of the query itself). Recent works[CGCR10, GPLL17] in this area apply constraints onthe output space to provide weak signal to the semanticparser.

The second line of research that the present work ex-tends leverages machine learning techniques to eitherdiscover the underlying dynamics of a system governedby unknown PDEs, or to build faster and more accu-rate differential equation solvers. [RK17, Rai18] use tra-ditional machine learning techniques and deep learningtechniques respectively for both predicting the future oftime-dependent dynamical systems and for discoveringtheir underlying equations. [FGP17] uses conditionalgenerative models to find the equilibrium solution to anumber of transport problems faster than traditional it-erative methods. [HJE17, SS17] leverage deep learn-ing techniques to approximately solve partial differentialequations is high-dimensional spaces, where traditionaliterative techniques break down. All of these techniquesuse large quantities of data in a supervised way, unlikethe present work.

3 Background

3.1 Heat-Transport

In 2-D heat transport, we consider a flat square platemade of some thermally conductive material that is in-sulated along its edges. Heat is applied to the plate insome way, and our goal is to model the way thermal en-ergy moves through the plate. The initial condition isgiven by T (x, y, 0), and we wish to determine T (x, y, t),the temperature field on the plate at time t. In our modelwe assume that non-zero elements of T (x, y, 0) representan applied heat i.e. heat applied the point (x, y) on thegrid for the duration of the transport experiment. Underideal assumptions, it can be shown that T satisfies thetwo dimensional heat equation [Fou07]

∂T

∂t= c2∇2T = c2(

∂2T

∂x2+∂2T

∂y2)

where c > 0 is a constant for the thermal conductivity ofthe plate. We can also study solutions that do not varywith time, known as the steady-state solutions to the sys-tem

∂T

∂t= 0

In this case we get the Laplace Equation:

∇2T =∂2T

∂x2+∂2T

∂y2= 0 (1)

Page 3: Weakly-Supervised Deep Learning of Heat Transport via ...

Solutions to the Laplace equation are known as harmonicfunctions, and the particular solution to the steady-stateheat transport problem is the harmonic function whichalso satisfies the initial condition of the system. Whenheat is applied only to the boundary of the plate, as it isin the cases we study in this paper, equation 1 is knownas the Dirichlet boundary problem [Dir50].

3.2 Finite Difference

Finite difference is an iterative method used to computeexact solutions to partial differential equations via a dis-cretization of the equations, and an update rule that is de-fined by the equations. To solve the 2D steady state heatequation using finite difference, we need to discretize∇2T=0. Considering evenly spaced 2-D grid, the dis-cretized form of∇2T=0 for node (i,j) would be:

Ti,j =Ti+1,j + Ti−1,j + Ti,j+1 + Ti,j−1

4(2)

The nodal relation expressed in the above equation issolved iteratively by applying the rule above at each node(point in the grid) until convergence.

4 Model

4.1 Approach

The aim of this work is to train a fully convolutional neu-ral network to directly infer the solution to the Laplaceequation (1) when given the initial condition as input (i.e.train the neural network to be a solver for the Dirichletboundary problem [Dir50]). We accomplish this withoutever seeing solutions to the boundary problem by encod-ing the differential equations into a physics-informed lossfunction, described in section 4.3, which motivates thenetwork to find the solution without the use of supervi-sion in the form of data. The architecture of the networkis described in section 4.2.

Each instantiation of the Dirichlet boundary problem isgiven to the neural network in the form of a n× n imagematrix representing a thermally conductive plate, withthe value in each entry representing the temperature ap-plied to each point on the plate. In this work, we only ap-ply heat to the boundary of the plate, so the only non-zeroentries in the input are at the boundaries. Zero values inthe input represent points on the plate with no tempera-ture applied, meaning they can change over the course oftime, as they are influenced by the temperature at neigh-boring points on the thermally conductive plate.

The desired output is also an n × n image matrix repre-senting the temperature values at each point on the plate

after the heat flow process has converged to an equilib-rium temperature distribution.

The network is trained with randomly initialized bound-ary conditions. Because the network is never providedsolutions to the boundary value problem, we can gener-ate new data points on the fly virtually free of cost bycreating new n × n matrices with the boundaries filledin by random values. In this work, we choose four ran-dom temperatures uniformly between 0 and 100 and setthe temperatures of the top, right, left and bottom of theplate to be constant, and equal to each of the random tem-peratures correspondingly. An example input and outputare shown in Figure 1.

In section 4.4, we describe how the output image isdownsampled repeatedly and multiscale loss function iscomputed. Downsampling and training over multiplelosses in this fashion dramatically speeds up training andimproves the quality of the output images.

4.2 Network Architecture

The architecture of the network is a fully convolutionalencoder-decoder network adapted from the U-Net archi-tecture [RFB15]. The fully convolutional network iscomprised of several encoding convolutional layers thatdecrease the image size to that of the latent space (inour case, 512 × 1 × 1). The decoding layers consistof transposed convolutions on the output of the previ-ous layer concatenated with the corresponding encodinglayer. The concatenations amount to skip connectionsfrom each encoding layer to its corresponding outputlayer. Ultimately, an image of the original input size isrecovered, and the aim is for the final image to representthe equilibrium temperatures on the plate. The architec-ture is shown in Figure 1.

Figure 1: Encoder-decoder U-Net [RFB15] architectureof the network. Each encoding layer is connected to itscorresponding decoder layer via a skip connection. Thenetwork is fully convolutional, and thus can be scaledto arbitrary size by adding layer. The input is the initialcondition, and the output is the equilibrium condition.

Page 4: Weakly-Supervised Deep Learning of Heat Transport via ...

The motivation for using a fully convolutional architec-ture is to flexibly use the same architecture to solve prob-lems at multiple scales. The purpose of the skip connec-tions are to pass the boundary values of the input to theoutput layers, so the network is not forced to memorizethe structure of the input in its bottleneck layers. The ar-chitecture mirrors that of [FGP17], in which this networkarchitecture was used to solve a variety of differentialequations in a supervised way.

4.3 Physics Informed Kernel

The equilibrium condition defined in Equation (1) can beencoded in a simple rule: the temperature at each pointon the plate (that is not initially driven by a heat source)should be the average of its neighbors. In fact, it is byiteration of this rule that the finite-difference method forsolving partial differential equations typically solves thisproblem, as was shown in equation 2.

Examining this condition, we find that it can easily beencoded into a 3x3 convolutional kernel as follows:

0 -1 0-1 +4 -10 -1 0

This kernel is run convolutionally across an image, andthe outputs flattened and normed, to calculate the loss ofthe output: ∑

i,j

(Conv2d(kernel, output)ij)2 (3)

By minimizing this loss, the neural networks learns toidentify harmonic functions that form solutions to thegiven Dirichlet boundary problem specified by the ini-tial condition, since the boundary of the output is fixedto be equal to the initial condition

Although this kernel can easily be identified by the struc-ture of the heat equations, we show in section 5.3 thatthis kernel can easily be learned if labels are provided (inthe form of correct solutions to the heat flow equations).Although this kernel is easy derive and justify for thecase of heat transport, in principle a similar kernel canbe found for any system whose dynamics are defined bypartial differential equations. In general, given an updaterule for finite difference equations, it is easy to encodethis rule into a convolutional kernel.

4.4 Progressive Downsampling for Growing Loss

The network can have difficulty learning to output theheat distribution that minimizes the loss function for very

large input sizes, in part because outputting a constant onthe entire field has a loss of 0, if we ignore the boundary.As the image size grows larger, the boundary values be-comes a proportionally smaller fraction of the total im-age, reducing the proportion of loss contributed by out-putting incorrect values for the points near the border ofthe plate. Thus, the network is less able to accurately fillin correct temperatures, especially towards the center ofthe image, instead opting to output constant temperaturefields.

In order to solve this problem, we adopt a modified ver-sion of the strategy of progressively growing the out-put of the network to the final problem size, first intro-duced in [KALL17]. In our case, instead of progressivelyadding decoding layers, as in [KALL17], we computethe loss on several downsampled versions of the outputimage, and weight their contribution to the overall lossaccording to a weight vector

λ = [λ1, λ2, . . . , λn]

where each λi is the weight of the loss on the ith down-sampled version of the output. The loss for each individ-ual downsampled output is computed in the same way asdescribed in section 4.3, but the weight vector can slowlybe tuned from λ = [0, 0, . . . , 1] to λ = [1, . . . , 0, 0].Early in training, this motivates the network to outputimages which satisfy the overall higher level structure ofthe desired output. We slowly increase the weights asso-ciated with getting the finer grained details of the outputcorrect.

In our training, we choose a downsample factor of 4, un-til the images are of size 32 × 32, the scale at whichtraining works well even without a progressively down-sampled loss function.

The general framework is shown in Figure 2

Figure 2: The output of the network is downsampled re-peatedly by a factor of 4 until the dimension is 32 × 32.The downsample by N operation used in this case sim-ply uses ever N th pixel of the previous image. The lossin equation 3 is computed for each downsampled image,multipled by its corresponding weight λi and added tocompute the total loss.

Page 5: Weakly-Supervised Deep Learning of Heat Transport via ...

This framework can be viewed as a special case of cur-riculum learning [BLCW09], where the difficulty of thecurriculum scales as the weights in λ become more con-centrated on the largest scale. This provides a flexibleframework by which output size can be used to craft acurriculum, and we suspect that it can be widely appliedto multiscale problems in image generation, as well asother domains.

5 Experiments & Applications

5.1 Solving the Boundary Value Problem

Test results are shown for inputs of size 256×256 inFigure 3. During training of this experiment, the net-work was never shown the correct answer to the bound-ary value problem, but only ever given random initial-izations and trained to minimize the loss in equation( 3).In the figure a number of images are shown containingthe boundary value problem specified, the true equilib-rium temperatures given that boundary condition (solvedto high precision via finite difference), and the output ofthe neural network after training.

The network has sixteen hidden layers (8 encoding, 8 de-coding) and has been trained for 128 epochs using a pro-gressive downsampling strategy to grow the loss functionas described in section 4.4. The optimization is donewith an Adam optimizer [KB14]. Despite never seeinga correct desired output for the boundary value problem,the average per-pixel output error is only 1.39% with astandard deviation of 1.24%. The test examples are un-seen during training, and are generated randomly accord-ing to the same procedure as during training.

5.2 Solving at Very Large Scale

As described in section 4.4, the network has difficultylearning the correct solution at larger scales, because out-putting constant values along the entire image becomesa viable strategy to achieve a low loss. However, by us-ing the strategy of progressive downsampling to grow theloss function, the network is able to easily find solutionsto the boundary value problem, achieving good results injust a few epochs. The basic method, without progressivedownsampling, fails entirely to learn.

Results for inputs of size 1024×1024 are show in in Fig-ure 4. The network has twenty hidden layers (10 encod-ing, 10 decoding) and has been trained for 128 epochs,optimized with Adam [KB14]. The average per-pixeloutput error is only 1.41% with a standard deviation of1.64%. The test examples are unseen during training,and are generated randomly according to the same pro-cedure as during training.

Initial Condition Correct Output Deep Learning

Figure 3: In the left column are the input initial con-ditions to the network. The second column shows thecorrect output temperature distribution (solved to high-precision via finite difference) that satisfies the equilib-rium condition specified by equation (1). The third col-umn shows the output of the deep neural network thatis trained to minimize (3) These results are for inputs ofsize 256×256

Page 6: Weakly-Supervised Deep Learning of Heat Transport via ...

Initial Condition Correct Output Deep Learning

Figure 4: In the left column are the input initial con-ditions to the network. The second column shows thecorrect output temperature distribution (solved to high-precision via finite difference) that satisfies the equilib-rium condition specified by equation (1). The third col-umn shows the output of the deep neural network that istrained to minimize 3. These results are for inputs of size1024×1024

5.3 Learning the Convolutional Kernel

The kernel defined in section 4.3, copied below, can beeasily determined by inspection of the differential equa-tions defining heat transport (i.e. the form of the Laplaceequation (1)).

0 -1 0-1 +4 -10 -1 0

This may not be the case in general, however, as the lo-cal invariants satisfied by a system that obeys a certainset of differential equations may not be readily apparent.Furthermore, for some systems the differential equationsgoverning their evolution may not even be known.

In this experiment, we show that for the case of heattransport, this convolutional kernel can be learned tohigh-precision as long as we have data. This data shouldconsist of the equilibrium conditions of heat transport(i.e. solutions to the Dirichlet problem defined in equa-tion (1) for various initial conditions — the initial condi-tions themselves are not required).

We generate data on the fly by randomizing the initialcondition and running an iterative finite-difference solverto solve. The data generated is 8×8 images which areconverged to high precision. We then run a learnableconvolutional kernel across the data and sum the abso-lute value of the outputs. The convolutional kernel isoptimized via Adam, and its parameters seek to mini-mize the absolute value of the outputs of the convolution.After several thousand iterations on randomly generateddata points, this procedure learns the following kernel:

0 0.0545 0.00010.0545 -0.2181 0.05450 0.0547 0

This is almost exactly a scaled version of the originalkernel, which encodes the same rule: that each pointon the eqilibrium heat distribution should be the aver-age of the temperature of its four neighboring points.This shows that if we have data which describes con-verged conditions of the phenomenon of interest, we canlearn the local rules that define equilibrium condition di-rectly from this data. In principle, this can be used todiscover differential equations governing the dynamicsfor unknown systems, although more work is needed toshow that this can be done with more complex systemsthan heat-transport (e.g. systems with higher-order non-linearities).

Page 7: Weakly-Supervised Deep Learning of Heat Transport via ...

5.4 Application: Speeding Up Finite DifferenceCalculation

The Finite Difference method described in Section 3.2can be slow to converge to the correct solution [Rem12],and is particularly sensitive to the initial condition givento the algorithm. Thus, we can use a trained neural net-work to provide an output that is approximately correct(within 1.5% error), and this answer can be used as aninitial condition to be refined by the finite differencemethod. The finite difference method can then com-pute the exact solution to a desired level of precision.Once trained, the forward pass through a neural networkis fast, accounting for a negligible fraction of the timefinite-difference algorithm takes to converge. Our resultsdemonstrate that this “warm-start” approach for solvingthe heat equations converges much faster than a constantinitialization — i.e. one in which all pixels not on theborder of the image are set to the average of pixels on theborder. In Figure 5, we compare the errors from groundtruth at each iteration of finite difference for the two dif-ferent solvers, one that is given the output of the trainedneural net as a warm start and one that is not. We com-pute “ground truth” by running finite difference to a veryhigh level of precision relative to both initializations ofthe algorithm.

A more appropriate comparison for this method would beto hierarchical finite difference solvers, which are moreoften used in practice, but it is hard to make a direct com-parison of the error between a hierarchical solver andour proposed method, as the hierarchical solver does notsolve the full problem until near the end of its computa-tion. In terms of wall-clock time, our method is not com-petitive with the hierarchical solvers, however it is possi-ble that if a separately trained neural network is used ateach layer of the hierarchy, our strategy will be superiorto hierarchical finte difference algorithms.

6 Conclusion

We have shown how the equilibrium solution of the heattransport problem can be learned in a weakly supervisedway by use of a physics informed loss function that en-codes the local rules defined by the differential equationsof heat transport. Further, we have shown that it is pos-sible to use this technique to speed up finite differencecalculations relative to the naive approach. We have alsodemonstrated that the local rule defining heat transportcan be learned, in the form of a kernel, directly from data.In principle, the differential equations for heat transportcan be replaced with those for any other phenomenonwhose dynamics are defined by partial differential equa-tions. This work points toward the possibility of encod-

Figure 5: A comparison of two strategies to solving finitedifference are shown. The yellow line shows the averageper-pixel error of the computed solution at each itera-tion of the finite difference algorithm when it is given theoutput of our trained neural network (size 256×256 asin section 5.1) as an initialization. The blue line showsthe average per-pixel error of the computed solution ateach iteration of the finite difference algorithm when it isinitialized with constant values equal to the average tem-perature of each of the borders of the plate. Average per-pixel error is computed relative to “ground truth”, whichis determined by running finite difference to a very highprecision.

Page 8: Weakly-Supervised Deep Learning of Heat Transport via ...

ing our knowledge of the dynamics of the world into neu-ral networks, allowing them to learn how physical sys-tems work even in the absence of data.

References

[AZ11] Yoav Artzi and Luke Zettlemoyer. Boot-strapping semantic parsers from conversa-tions. In Proceedings of the Conference onEmpirical Methods in Natural LanguageProcessing, EMNLP ’11, pages 421–432,Stroudsburg, PA, USA, 2011. Associationfor Computational Linguistics.

[BLCW09] Yoshua Bengio, Jerome Louradour, Ro-nan Collobert, and Jason Weston. Cur-riculum learning. In Proceedings of the26th Annual International Conference onMachine Learning, ICML 2009, Montreal,Quebec, Canada, June 14-18, 2009, pages41–48, 2009.

[CGCR10] James Clarke, Dan Goldwasser, Ming-WeiChang, and Dan Roth. Driving semanticparsing from the world’s response. In Pro-ceedings of the Fourteenth Conference onComputational Natural Language Learn-ing, CoNLL ’10, pages 18–27, Strouds-burg, PA, USA, 2010. Association forComputational Linguistics.

[Dir50] P.G.L Dirichlet. Uber einen neuen Aus-druck zur Bestimmung der Dictigkeit einerunendlich dunnen Kugelschale, wenn derWerth des Potentials derselben in jedemPunkte ihrer Oberflache gegeben ist. AbhKongilich. Preuss. Akad. Wiss., 1850.

[FGP17] Amir Barati Farimani, Joseph Gomes,and Vijay S. Pande. Deep learning thephysics of transport phenomena. CoRR,abs/1709.02432, 2017.

[Fou07] Jean-Baptiste Joseph Fourier. 1807.

[GPLL17] Kelvin Guu, Panupong Pasupat,Evan Zheran Liu, and Percy Liang. Fromlanguage to programs: Bridging reinforce-ment learning and maximum marginallikelihood. CoRR, abs/1704.07926, 2017.

[GrMH13] A. Graves, A. r. Mohamed, and G. Hin-ton. Speech recognition with deep recur-rent neural networks. In 2013 IEEE Inter-national Conference on Acoustics, Speechand Signal Processing, pages 6645–6649,May 2013.

[HJE17] Jiequn Han, Arnulf Jentzen, and WeinanE. Overcoming the curse of dimensional-ity: Solving high-dimensional partial dif-ferential equations using deep learning.CoRR, abs/1707.02568, 2017.

[KALL17] Tero Karras, Timo Aila, Samuli Laine, andJaakko Lehtinen. Progressive growing ofgans for improved quality, stability, andvariation. CoRR, abs/1710.10196, 2017.

[KB14] Diederik P. Kingma and Jimmy Ba. Adam:A method for stochastic optimization.CoRR, abs/1412.6980, 2014.

[LJK13] Percy Liang, Michael I. Jordan, and DanKlein. Learning dependency-based com-positional semantics. Comput. Linguist.,39(2):389–446, June 2013.

[MKS+15] Volodymyr Mnih, Koray Kavukcuoglu,David Silver, Andrei A. Rusu, Joel Veness,Marc G. Bellemare, Alex Graves, MartinRiedmiller, Andreas K. Fidjeland, GeorgOstrovski, Stig Petersen, Charles Beat-tie, Amir Sadik, Ioannis Antonoglou, He-len King, Dharshan Kumaran, Daan Wier-stra, Shane Legg, and Demis Hassabis.Human-level control through deep rein-forcement learning. Nature, 518:529 EP–, Feb 2015.

[NDRT13] Nagarajan Natarajan, Inderjit S Dhillon,Pradeep K Ravikumar, and Ambuj Tewari.Learning with noisy labels. In C. J. C.Burges, L. Bottou, M. Welling, Z. Ghahra-mani, and K. Q. Weinberger, editors, Ad-vances in Neural Information ProcessingSystems 26, pages 1196–1204. Curran As-sociates, Inc., 2013.

[Rai18] Maziar Raissi. Deep hidden physicsmodels: Deep learning of nonlinearpartial differential equations. CoRR,abs/1801.06637, 2018.

[RBVR17] Alex Ratner, Stephen Bach, ParomaVarma, and Chris R. Weak supervision:The new programming paradigm for ma-chine learning (blog post), 2017.

[RDS+15] Olga Russakovsky, Jia Deng, Hao Su,Jonathan Krause, Sanjeev Satheesh, SeanMa, Zhiheng Huang, Andrej Karpa-thy, Aditya Khosla, Michael Bernstein,

Page 9: Weakly-Supervised Deep Learning of Heat Transport via ...

Alexander C. Berg, and Li Fei-Fei. Ima-genet large scale visual recognition chal-lenge. Int. J. Comput. Vision, 115(3):211–252, December 2015.

[Rem12] Courtney Remani. Numerical methods forsolving systems of nonlinear equations,2012.

[RFB15] Olaf Ronneberger, Philipp Fischer, andThomas Brox. U-net: Convolutional net-works for biomedical image segmenta-tion. In Nassir Navab, Joachim Horneg-ger, William M. Wells, and Alejandro F.Frangi, editors, Medical Image Comput-ing and Computer-Assisted Intervention– MICCAI 2015, pages 234–241, Cham,2015. Springer International Publishing.

[RK17] Maziar Raissi and George Em Karni-adakis. Hidden physics models: Ma-chine learning of nonlinear partial differ-ential equations. Journal of Computa-tional Physics, 2017.

[SB98] Richard S. Sutton and Andrew G. Barto.Introduction to Reinforcement Learning.MIT Press, Cambridge, MA, USA, 1stedition, 1998.

[SE16] Russell Stewart and Stefano Ermon.Label-free supervision of neural networkswith physics and domain knowledge.CoRR, abs/1609.05566, 2016.

[Set12] Burr Settles. Active learning. SynthesisLectures on Artificial Intelligence and Ma-chine Learning, 6(1):1–114, 2012.

[SS17] J. Sirignano and K. Spiliopoulos. DGM:A deep learning algorithm for solving par-tial differential equations. ArXiv e-prints,August 2017.

[SSS+17] David Silver, Julian Schrittwieser, KarenSimonyan, Ioannis Antonoglou, AjaHuang, Arthur Guez, Thomas Hubert, Lu-cas Baker, Matthew Lai, Adrian Bolton,Yutian Chen, Timothy Lillicrap, Fan Hui,Laurent Sifre, George van den Driessche,Thore Graepel, and Demis Hassabis.Mastering the game of go without humanknowledge. Nature, 550:354 EP –, Oct2017. Article.

[SVL14] Ilya Sutskever, Oriol Vinyals, and Quoc VLe. Sequence to sequence learning with

neural networks. In Z. Ghahramani,M. Welling, C. Cortes, N. D. Lawrence,and K. Q. Weinberger, editors, Advancesin Neural Information Processing Systems27, pages 3104–3112. Curran Associates,Inc., 2014.

[WfWB+09] Jacob Whitehill, Ting fan Wu, JacobBergsma, Javier R. Movellan, and Paul L.Ruvolo. Whose vote should count more:Optimal integration of labels from label-ers of unknown expertise. In Y. Bengio,D. Schuurmans, J. D. Lafferty, C. K. I.Williams, and A. Culotta, editors, Ad-vances in Neural Information ProcessingSystems 22, pages 2035–2043. Curran As-sociates, Inc., 2009.

[ZG09] Xiaojin Zhu and Andrew B. Goldberg.Introduction to semi-supervised learning.Synthesis Lectures on Artificial Intelli-gence and Machine Learning, 3(1):1–130,2009.