with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis...

34
Quantum Chemistry Energy Regression with Scattering Transforms Matthew Hirn, Stéphane Mallat École Normale Supérieure Nicolas Poilvert Penn-State 1

Transcript of with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis...

Page 1: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Quantum Chemistry Energy Regression with Scattering Transforms

Matthew Hirn, Stéphane Mallat École Normale Supérieure

Nicolas Poilvert Penn-State

1

Page 2: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Quantum Chemistry Regression

2

Page 3: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Linear Regression

3

Page 4: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Energy Properties

• Invariant to permutations of the index k.

4

Page 5: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Overview

• Coulomb kernel representations

• Density functional approach to representation

• From Fourier to wavelet energy regressions

• Wavelet scattering dictionaries: deep networks without learning

• Numerical energy regression results

• Relations with image classification and deep networks

5

Page 6: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Coulomb Kernel

6

Page 7: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

withOrganic molecules

Hydrogne, CarbonNitrogen, OxygenSulfur, Chlorine

H4C6OS H9C7NO

H3C6NO2H9C8N

Density Functional Theory

• Computes the energy of a molecule x from

its electronic probability density ⇢x

(u) for u 2 R3

7

Page 8: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Kohn-Sham model:

E(�) = T (�) +Z

�(u) V (u) +12

Z�(u)�(v)|u� v| dudv + E

xc

(�)

Molecular

energy

At equilibrium:

Density Functional Theory

Kineticenergy

electron-electron

Coulomb repulsion

electron-nuclei

attraction

Exchange

correlat. energy

8

Page 9: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

• Coulomb potential energy:

Coulomb Interactions in Fourier

Diagonalized in Fourier:

9

Page 10: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Coulomb in Fourier Dictionary

�1

�2

10

Page 11: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Large Scale Instabilities

11

Page 12: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

(Rocklin, Greengard)

Coulomb Multiscale Factorizations

• Multiscale regroupment of interactions:

For an error ✏, interactions can be reduced to O(log ✏) groups

Fast multipoles

12

Page 13: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Scale separation with Wavelets

�1

�2

rotated and dilated:

real parts imaginary parts

13

Page 14: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Wavelet Interference for Densities

14

Page 15: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Sparse Wavelet Regression

For any ✏ > 0 there exists wavelets with

Theorem:

�1

�2

15

Page 16: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Dictionaries for Quantum Energies

16

Page 17: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Atomization Density

Electronic density ⇢x

(u) Approximate density ⇢x

(u)

17

Page 18: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Sparse Linear Regressions

18

Page 19: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

x

Regression:

Fourier and Wavelets Regressions

log2 M

Quantum Energy Regression using Scattering Transforms

the RMSE is due to the fact that a scattering regression hassmaller error outliers.

0 1 2 3 4 5 6 7 8 9 10Model Complexity log (M)

1

2

3

4

5

6

7

8

�� |

�|��

FourierWaveletScatteringCoulomb

Figure 2. Decay of the log RMSE error12 log2

hE

⇣|f(x)� ˜

f

M

(x)|2⌘i

over the larger database of4357 molecules, as a function of log2(M) in the Fourier (green),Wavelet (blue) and Scattering (red) regressions. The dotted linegives the Coulomb regression error for reference.

Table 1 shows that the error of Fourier and wavelet regres-sion are of the same order although the Fourier dictionaryhas 1537 elements and the wavelet dictionary has only 61.Figure 2 gives the decay of these errors as a function ofM . This exepected error is computed on testing molecules.The circles on the plot give the estimated value of M whichyield a minimum regression error by cross-validation overthe training set (reported in Table 1). Although the Fourierand wavelet regressions reach nearly the same minimumerror, the decay is much faster for wavelets. When goingfrom the smaller to the larger database, the minimum errorof the Fourier and wavelet regressions remain nearly thesame. This shows that the bias error due to the inability ofthese dictionaries to precisely regress f(x) is dominatingthe variance error corresponding to errors on the regressioncoefficients. The Coulomb and Scattering representationson the other hand, achieve much smaller bias errors on thelarger database.

The number of terms of the scattering regression is M =

591 on the larger database, although the dictionary size is11071. A very small proportion of scattering invariants aretherefore selected to perform this regression. The chosenscattering coefficients used for the regression are coeffi-cients corresponding to scales which fall between the min-imum and maximum pairwise distances between atoms inthe molecular database. These selected coefficients are thusadapted to the molecular geometries.

7. ConclusionThis paper introduced a novel intermediate molecular rep-resentation through the use of a model electron density.The regression is performed on a scattering transform ap-plied to a model density built from a linear superposition ofatomic densities. This transform is well adapted to quan-tum energy regressions because it is invariant to the per-mutation of atom indices, to isometric transformations, itis stable to deformations, and it separates multiscale inter-actions. It is computed with a cascade of wavelet convolu-tions and modulus non-linearities, as a deep convolutionalnetwork. State-of-the-art regression accuracy is obtainedover two databases of two-dimensional organic molecules,with a relatively small number of scattering vectors. Under-standing the relation between the choice of scattering coef-ficients and the physical and chemical properties of thesemolecules is an important issue.

Numerical applications have been carried over planarmolecules, which allows one to restrict the electronic den-sity to the molecular plane, and thus compute a two-dimensional scattering transform. A scattering transformis similarly defined in three dimensions, with the same in-variance and stability properties. It involves computing awavelet transform on the two-dimensional sphere S2 in R3

(Starck et al., 2006) as opposed to the circle S

1. It entailsno mathematical difficulty, but requires appropriate soft-ware implementations which are being carried out.

Energy regressions can also provide estimations of forcesthrough differentiations with respect to atomic positions.Scattering functions are differentiable and their differentialcan be computed analytically. However, the precision ofsuch estimations remain to be established.

AcknowledgmentsThis work was supported by the ERC grant InvariantClass320959.

ReferencesAdamo, C. and Barone, V. Toward reliable density func-

tional methods without adjustable parameters: The pbe0model. The Journal of Chemical Physics, 110(6158),1999.

Bartok, A.P., Payne, M.C., Kondor, R., and Csanyi, G.Gaussian approximation potentials: The accuracy ofquantum mechanics, without the electrons. Physical Re-

view Letters, 104(13):136403(4), 2010.

Bartok, A.P., Kondor, R., and Csanyi, G. On represent-ing chemical environments. Physical Review B, 87(18):184115(16), 2013.

19

Page 20: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Energy Regression Results

20

Page 21: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Wavelet Dictionary

|⇢ ⇤ j1,✓1(u)|

Rotations ✓1

Scales

j1

⇢(u)

21

Page 22: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Wavelet Dictionary

Rotations ✓1

Scales

j1

|⇢ ⇤ j1,✓1(u)|22

Page 23: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Scattering Dictionary

Recover translation variability:

Recover rotation variability:|⇢ ⇤ j1,·(u)|~ l2(✓1)

|⇢ ⇤ j1,✓1 | ⇤ j2,✓2(u)

Rotations ✓1

Scales

j1

|⇢ ⇤ j1,✓1(u)|23

Page 24: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Scattering Dictionary

Recover translation variability:

Recover rotation variability:|⇢ ⇤ j1,·(u)|~ l2(✓1)

|⇢ ⇤ j1,✓1 | ⇤ j2,✓2(u)

Combine to recover roto-translation variabiltiy:

Rotations ✓1

Scales

j1

|⇢ ⇤ j1,✓1(u)|

||⇢ ⇤ j1,·| ⇤ j2,✓2(u)~ l2(✓1)|

24

Page 25: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Scattering Second Order

|⇢ ⇤ j1,✓1(u)|, j1 fixed

Rotations ✓2

Scales

j2

j1, l2 fixed

||⇢ ⇤ j1,·| ⇤ j2,✓2(u)~ l2(✓1)|

25

Page 26: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Scattering Dictionary

26

Page 27: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Quantum Energy Regression using Scattering Transforms

the RMSE is due to the fact that a scattering regression hassmaller error outliers.

0 1 2 3 4 5 6 7 8 9 10Model Complexity log (M)

1

2

3

4

5

6

7

8

�� |

�|��

FourierWaveletScatteringCoulomb

Figure 2. Decay of the log RMSE error12 log2

hE

⇣|f(x)� ˜

f

M

(x)|2⌘i

over the larger database of4357 molecules, as a function of log2(M) in the Fourier (green),Wavelet (blue) and Scattering (red) regressions. The dotted linegives the Coulomb regression error for reference.

Table 1 shows that the error of Fourier and wavelet regres-sion are of the same order although the Fourier dictionaryhas 1537 elements and the wavelet dictionary has only 61.Figure 2 gives the decay of these errors as a function ofM . This exepected error is computed on testing molecules.The circles on the plot give the estimated value of M whichyield a minimum regression error by cross-validation overthe training set (reported in Table 1). Although the Fourierand wavelet regressions reach nearly the same minimumerror, the decay is much faster for wavelets. When goingfrom the smaller to the larger database, the minimum errorof the Fourier and wavelet regressions remain nearly thesame. This shows that the bias error due to the inability ofthese dictionaries to precisely regress f(x) is dominatingthe variance error corresponding to errors on the regressioncoefficients. The Coulomb and Scattering representationson the other hand, achieve much smaller bias errors on thelarger database.

The number of terms of the scattering regression is M =

591 on the larger database, although the dictionary size is11071. A very small proportion of scattering invariants aretherefore selected to perform this regression. The chosenscattering coefficients used for the regression are coeffi-cients corresponding to scales which fall between the min-imum and maximum pairwise distances between atoms inthe molecular database. These selected coefficients are thusadapted to the molecular geometries.

7. ConclusionThis paper introduced a novel intermediate molecular rep-resentation through the use of a model electron density.The regression is performed on a scattering transform ap-plied to a model density built from a linear superposition ofatomic densities. This transform is well adapted to quan-tum energy regressions because it is invariant to the per-mutation of atom indices, to isometric transformations, itis stable to deformations, and it separates multiscale inter-actions. It is computed with a cascade of wavelet convolu-tions and modulus non-linearities, as a deep convolutionalnetwork. State-of-the-art regression accuracy is obtainedover two databases of two-dimensional organic molecules,with a relatively small number of scattering vectors. Under-standing the relation between the choice of scattering coef-ficients and the physical and chemical properties of thesemolecules is an important issue.

Numerical applications have been carried over planarmolecules, which allows one to restrict the electronic den-sity to the molecular plane, and thus compute a two-dimensional scattering transform. A scattering transformis similarly defined in three dimensions, with the same in-variance and stability properties. It involves computing awavelet transform on the two-dimensional sphere S2 in R3

(Starck et al., 2006) as opposed to the circle S

1. It entailsno mathematical difficulty, but requires appropriate soft-ware implementations which are being carried out.

Energy regressions can also provide estimations of forcesthrough differentiations with respect to atomic positions.Scattering functions are differentiable and their differentialcan be computed analytically. However, the precision ofsuch estimations remain to be established.

AcknowledgmentsThis work was supported by the ERC grant InvariantClass320959.

ReferencesAdamo, C. and Barone, V. Toward reliable density func-

tional methods without adjustable parameters: The pbe0model. The Journal of Chemical Physics, 110(6158),1999.

Bartok, A.P., Payne, M.C., Kondor, R., and Csanyi, G.Gaussian approximation potentials: The accuracy ofquantum mechanics, without the electrons. Physical Re-

view Letters, 104(13):136403(4), 2010.

Bartok, A.P., Kondor, R., and Csanyi, G. On represent-ing chemical environments. Physical Review B, 87(18):184115(16), 2013.

log2 M

x Scattering Regression

Regression:

27

Page 28: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Quantum Chemistry Energy Regression Results

28

Page 29: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

From 2D to 3D Scattering

�1

�2

29

Page 30: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Original images of N2 pixels:

Order m = 2

Reconstruction from ScatteringJoan Bruna

Reconstruction from {kxk1 , kx ? �1k1 , k|x ? �1 | ? �2k1} : O(log

22 N) coe↵.

Page 31: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Original Textures

Ergodic Texture ReconstructionsJoan Bruna

m = 2, 2

J= N : reconstruction from O(log

22 N) scattering coe↵.

Page 32: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

LeCun et. al.

Classification Errors

Joan Bruna

Digit Classification: MNIST

Linear ClassifierSJx y = f(x)

x

Page 33: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Classification Accuracy

SJx

Data Basis Deep-Net Scat.-2CalTech-101 85% 80%CIFAR-10 90% 80%

Rigid Mvt.computes invariants

Complex Image Classification

BateauNénuphareMetronome CastoreArbre de JoshuaAncre

CalTech 101 data-basis:

Linear Classif. yx

Edouard Oyallon

Page 34: with Scattering Transformshelper.ipam.ucla.edu/publications/ml2015/ml2015_12560.pdfData Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80% Rigid Mvt. computes invariants Complex

Conclusion

• Quantum energy regression involves generic invariants to rigid movements, stability to deformations, multiscale interactions

• These properties require scale separations, hence wavelets.

• Multilayer wavelet scattering create large number of invariants

• Equivalent to deep networks with predefined wavelet filters

• Knowing physics provides the invariants: can avoid learning representations

34