Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010 Approximate Interpolation by a Class of Neural Networks in Lebesgue Metric Chunmei Ding, Yubo Yuan, Feilong Cao' Institute of Metrology and Computational Science, China Jiliang University, Hangzhou 310018, Zhejiang Province, P R China. E-mail: [email protected] Abstract In this paper, a class of approximate interpola- tion neural networks is constructed to approximate Lebesgue integrable functions. It is showed that the net- works can arbitrarily approximate any p-th Lebesgue integrable function in Lebesgue metric as long as the number of hidden nodes is sufficiently large. The re- lation among the approximation speed, the number of hidden nodes, the interpolation sample and the smooth- ness of the target function is also revealed by designing the Steklov mean function and the modulus of smooth- ness of f. The obtained results are helpful in study- ing the problem of approximation complexity of in- terpolation neural networks in Lebesgue metric. Keywords: Neural networks; Interpolation; Approximation; Estimate of error; Lebesgue metric 1 Introduction Throughout this paper, Nand R denote the natural numbers set and the set of real numbers, respectively, and, for any positive integer d, R d denotes the d dimensional Euclidean space. Let S = {xo, Xl, ..., Xn} C R d denote a set of distinct vectors, and {/o , i l , ..., fn} e R a set of real numbers. Then is called a set of interpolation sample, and {Xi } ?=O is called a node system of interpolation. If there exists a function I : R d R such that f(Xi) = h i = 0,1, . . . , n, then we say that the function I is an exact interpolation of sample set (1) If there exists a function 9 : R d R such that Ig(Xi) - f il < E, i = O,l, ... ,n for positive real number E, then we call 9 an E-approximate interpolation of sample set (1). According to common us- age, a sigmoidal function a is defined in R with the follow- ing properties: lim a(t) = 1, and lim a(t) = O. (2) t�+= t�-= In recent years, interpolation problem of neural net- works has made great progress. We know that the sin- gle hidden layer feedforward neural networks (FNNs) with at most n + 1 neurons can learn n + 1 distinct samples (Xi , f i) (i = 0, 1,2,... , n) with zero error (exact interpo- lation), see in [19J, [21J, [31J and [33]. Ito and Saito [21J proved that if the activation function is continuous and non- decreasing sigmoidal function, then the exact interpolation can be made with inner weights Wj E S d - l , where S d - l is the unit sphere in R d In [31J Pinkus proved the same result but ¢ only needs to be continuous in R and not a polynomial. Shrivastava and Dasgupta [32J gave a proof for sigmoidal activation function ¢(x) := 1+!-". How- ever, it is more difficult to solve the exact interpolation net- works. So, ones turn to the study of approximate interpola- tion neural networks, which first were used in [33J as a tool to study the exact interpolation networks. It was proved in [33J that if arbitrary precision approximate interpolation exists in a linear space of functions, then an exact inter- polation can be obtained in that space. Furthermore, the fact "If ¢ is sigmoidal, continuous and there exists a point c such that ¢' ( c) 0, then an interpolation problem with 2n+ 1 samples can be approximated with arbitrary precision by a net with n + 1 neurons" was given. Recently, Llanas and Sainz [23J studied the existence and construction of E- approximate interpolation networks. They first considered that the activation function ¢ is a nondecreasing sigmoidal function satisfying the condition (2) and gave a new and quantitative proof of the fact that n + 1 hidden neurons can

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

Approximate Interpolation by a Class of Neural Networks in Lebesgue Metric

Chunmei Ding, Yubo Yuan, Feilong Cao'

Institute of Metrology and Computational Science, China Jiliang University,

Hangzhou 310018, Zhejiang Province, P R China.

E-MIAL: [email protected]

Abstract In this paper, a class of approximate interpola­

tion neural networks is constructed to approximate

Lebesgue integrable functions. It is showed that the net­

works can arbitrarily approximate any p-th Lebesgue

integrable function in Lebesgue metric as long as the

number of hidden nodes is sufficiently large. The re­

lation among the approximation speed, the number of

hidden nodes, the interpolation sample and the smooth­

ness of the target function is also revealed by designing

the Steklov mean function and the modulus of smooth­

ness of f. The obtained results are helpful in study­

ing the problem of approximation complexity of in­

terpolation neural networks in Lebesgue metric.

Keywords: Neural networks; Interpolation; Approximation;

Estimate of error; Lebesgue metric

1 Introduction

Throughout this paper, Nand R denote the natural numbers set and the set of real numbers, respectively, and, for any positive integer d, Rd denotes the d dimensional

Euclidean space. Let S = {xo, Xl, ... , Xn} C R d denote a set of distinct vectors, and {/o, il, ... , fn} e R a set of real numbers. Then

is called a set of interpolation sample, and {Xi}?=O is called a node system of interpolation.

If there exists a function I : Rd --+ R such that

f(Xi) = h i = 0,1, . . . , n,

then we say that the function I is an exact interpolation of

sam pIe set (1) If there exists a function 9 : R d --+ R such

Ig(Xi) - fil < E, i = O,l, ... ,n

for positive real number E, then we call 9 an E-approximate interpolation of sample set (1). According to common us­age, a sigmoidal function a is defined in R with the follow­ing properties:

lim a(t) = 1, and lim a(t) = O. (2) t�+= t�-=

In recent years, interpolation problem of neural net­works has made great progress. We know that the sin­

gle hidden layer feedforward neural networks (FNNs) with at most n + 1 neurons can learn n + 1 distinct samples

(Xi, fi) (i = 0, 1,2, ... , n) with zero error (exact interpo­lation), see in [19J, [21J, [31J and [33]. Ito and Saito [21J proved that if the activation function is continuous and non­

decreasing sigmoidal function, then the exact interpolation can be made with inner weights Wj E Sd-l, where Sd-l

is the unit sphere in Rd In [31J Pinkus proved the same result but ¢ only needs to be continuous in R and not a polynomial. Shrivastava and Dasgupta [32J gave a proof

for sigmoidal activation function ¢(x) := 1+!-". How­ever, it is more difficult to solve the exact interpolation net­works. So, ones turn to the study of approximate interpola­tion neural networks, which first were used in [33J as a tool to study the exact interpolation networks. It was proved

in [33J that if arbitrary precision approximate interpolation exists in a linear space of functions, then an exact inter­polation can be obtained in that space. Furthermore, the fact "If ¢ is sigmoidal, continuous and there exists a point

c such that ¢' (c) "I 0, then an interpolation problem with 2n+ 1 samples can be approximated with arbitrary precision by a net with n + 1 neurons" was given. Recently, Llanas and Sainz [23J studied the existence and construction of E­approximate interpolation networks. They first considered

that the activation function ¢ is a nondecreasing sigmoidal function satisfying the condition (2) and gave a new and quantitative proof of the fact that n + 1 hidden neurons can


Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

learn n + 1 distinct samples with zero error. Then, they in­troduced approximate interpolation networks, which do not require training and can approximately interpolate an arbi­trary set of distinct samples. We also notice that Llanas and Lantaron [24] considered the problem of Hermite interpola­tion of FNNs. However, it is natural to raise the following two questions: Can we replace the sigmoidal nondecreas­ing function used in [23] by other activation functions? Can we estimate the errors of approxim ation for constructed net­works?

2 A Class ofInterpolation Neural Networks

LetxO,xl, ... ,Xn be the data in Rd, p(a, b) = Ila-bl12 the Euclidean distance between the points a and b in R d, and parameter A := A(n) > ° depending on n. Then a class of functions gj : R d ---+ R, which can be used as activation function of FNNs, is defined by

e-AI'(x,xJ) gj(x):=gj(x,A):=

�n -A( )' j=O,l, ... , n. i=O e P x,x�

(3) And a linear combination of gj(x, A) then is defined by

n Nn(x) := :2::>jgj(x, A). ( 4)


Clearly, Nn(x) can be understood to be a FNN with four layers: the first layer is the input layer, the input is x(x E R d); the second layer is processing layer for computing val­uesp(x, xi),i = 0,1, . . . ,n, between input x and the proto­typical input points Xi, and it is as the input of the third layer that contains n + 1 neurons, gj(x, A) is activation function of the j-th neuron; the fourth layer is output layer, the out­put is N(x)

It is well-known that sigmoidal function rjJ(x) l+!-x, which usually be used as activation function in the

hidden layer of neural networks, is a logistic model. This model is an important one and has been widely used in bi­ology, demography and so one see [2], [17]). Naturally, the functions

efJ (x)

q,j(X):=�n efi(x)' j=0,1,2 ... , n ,=0

can be regarded as a multi-class generalization of the logis­tic model (see Section 10.6 in [14] ), which also was used as a regression model for the case of multi-class in the clas­sification problems. On the other hand, it follows from their structures that gj (x) contain the inform ation of the interpo­lation samples. The second layer of the network composed of gj (x) can be regarded as the processing layer and the in­put of the third layer, which is m ore convenient to the study of network interpolations.

In [3], the authors constructed exact and approximate interpolation networks by using the activation function gj (x) given by (3), and gave some estimations of conver­gence rate in unform metric.

3 Complexity of Approximation of Neural


The reason why FNNs are used in many areas is due to their universal approximation property, i.e., any contin­uous or integrable functions defined on a compact set can be approximated by a FNN with one hidden layer. This property can be essentially comprehended by the fact that FNNs are nonlinear parametric expressions for numerical functions. In connection with such paradigms there arise mainly three problems: density problem, complexity prob­lem and solution algorithm problem. The density problem was satisfactorily solved in the late 1980s and in the early 1990s. We refer to [10], [15], [20], [22], [6], [9], [7] and [8], where a graceful necessary and sufficient condition of the density theorem is obtained that the activation function 'f' is not a polynomial function. This result provides the the­oretical basis for selecting activation function on a bigger range.

The complexity of FNNs approximation mainly de­scribes the relationship among the topology structure of hid­den layer (such as the number of neuron and the value of weights), the approximation ability and the approximation rate. About complexity of FNNs, we refer to [13], [30], [16], [?], [12], [18], [27], [4].

In these studies, some upper estimations on approxi­mation rate were established under certain circumstances. Such a type of upper estimation results can, on the one hand, imply convergence of FNNs to the target functions, and, on the other hand, provide a quantitative estimation of how ac­curately the FNNs approximate the functions This paper addresses the LP complexity of approximation of interpola­tion networks form as (4).

4 LP Approximation by Neural Networks

More recently, some upper bounds of approximation error for FNNs in the uniform metric have been studied by Mhaskar and Micchelli [29], Maiorov and Meir [26], Makovoz [28J, Llanas and Sainz [23], Cao et al [3J, [4J, and so on. It has been obtained that FNN s with one hidden layer of sigmoidal nodes can achieve error of order O(n-a) (where ° < Q S 1 and n is the number of neurons in the hidden layer) for approximating various classes of suf­ficiently smooth functions in the uniform metric. However, an interesting question is: can we obtain the similar results for more general classes of functions besides continuous


Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

ones? And how? One of our main purposes in this paper is to address the issue of approximation complexity when LP integrable functions are approximated by interpolation neural networks defined by (4) in the LP metric.

Compared with the unifonn metric, the £p(1 -<; p < +CXJ) metric has also its specific advantages and solid use in science and engineering. For example, given a tar­get function g : Rd --+ R and a set of sample points

x(l), x(2) , . .. , x(L), which are viewed as input signals of the approximating neural network N, define the empirical error EL as

L EL(N) := ± � IN(x(i») -g(x(i»)IP , 1 -<; p < +CXJ.

i=l Then, it is known that whenever the input signals are gener­ated from a stationary source with distribution fL, the famous ergodic theorem (see [1]) holds and by this theorem it can be shown that the average of EL(N) will converge to the actual error

E:= r IN(x) -g(xW dfL. JR'

This underlies the feasibility of training the neural networks with a finite num ber of samples. Consistent with such theo­retical treatment, studying approximation complexity in the LP metric then becomes very promising and significant.

It is well known that given a desired approximation er­ror E, for any f E LP[a, b], 1 -<; p < +CXJ, there exists a Nn(x) such that

where [a, b] c R by the universal approximation property. This fact illustrates the existence of Nn(x) for approximat­ing any LP integrable function. In order to further reveal its approximation complexity, a novel constructive method which is mainly based on the Steklov mean function and the modulus of smoothness of f will be introduced in the paper. By the novel approach, not only will the size of the Nn(x) fonn as (4)(i.e., the number of neurons n in the hidden layer) be estimated given E and f, but also will the weights Ci be obtained by calculating the integrals of f over differ­ent subintervals of [a, b] (instead of resulting from some dis­crete values of f). Hence, the specified topology of Nn(x) will be detenn ined according to the results to be established mentioned above.

The paper is organized as follows. In the next section, we introduce some notations and the main results of this paper. In section 3, we show the main results of this paper. We conclude the paper with some useful conclusions and rem arks in the last section

5 The Notations and Main Results

Denote by LP[a, b] (1 -<; p < +CXJ) the space of p-th Lebesgue integrable functions defined on [a, b], whose nonn is usually defined as ( b ) l!p

II flip := lifO lip := l lf(t)IPdt < +CXJ.

For f E £P[a,b](l -<; p < +CXJ), we will use the mod­ulus of smoothness w(f, o)p of f, defined by

w(f, o)p:= sup r- If(t+h)-f(t)IPdt , ( b h ) l!p

O<h<:,8 Ja 1 -<; p < +CXJ,

which is usually a measurement tool of approximation er­ror. This modulus is also used to measure the smoothness of a function and approximation accuracy in approximation theory and Fourier analysis (see [25J, [11]).

It is clear that lim5--+o w(f, o)p = 0 and w(f, .A6)p -<; (A + l)w (f, o)p for any real num ber A 2': O. The function

f is called (M, a)-Lipschitz continuous (0 < a -<; 1) and is written as f E Lip(M, a), if, and only if there exists a constant M > 0, such that

w(f, o)p -<; Moa. For 0 < h -<; b - a, and f E LP[a, b], 1 -<; p < CXJ, we

define the Steklov mean function as

fh(X) := { � f�+h f(t)dt, a -<; x < b -h; (5)

Ii fb-h f(t)dt, b -h -<; x -<; b. An error between the Steklov mean function !h and the function f in LP metric is (see [3])


which will be proved in the next section. We summarize the main results we have obtained on

the approximation complexity for LP integrable functions approximated by interpolation neural networks fonn as (4) as follows.

Theorem 1 Let f E £P[a, b], 1 -<; p < +CXJ, Xi =

a+i b;,;a, fi = !h(Xi), andh = b;,;a fori = 0,1, . . . ,n.

Thenfor the interpolation sample set {(Xi, /;)};=o, there exists a constant A * > 0, and an approximate interpolation network defined by

such that

n Na(x) := � fh(Xj)gj(x),


( b - a) IlfO-NaOllp-<;27w f'-n- P for A > A*.



Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

6 The Proof of Main Results

In this section we give the proof of Theorem 1 and The­orem 2.

Now we begin to prove Theorem 1. Divide [a , b] into a = Xo < Xl < ... < Xn = b, where X; := a + ih, h := b�a, Let f; := A(x;),i = 0,1,2, . .. ,n. Then we can construct an approximate interpolation network for a posi­tive constant A := A(n) depending on n

n Na(x) := � j;g;(x, A(n)),


which satisfies Theorem 1. Noting that

IINa -flip:; IINa -Allp + Ilfh -flip, and using (6), we only need to prove

IINa -Allp:; 24w(f,h)p. (8)

So, we use notation (5) and obtain

IINa -fhll� = lb INa (x) -fh(X)lPdx

n-l x � L J INa(x) -A(x)lPdx

j=2 X]_l

+ le1 INa(x) - A(x)lPdx b

+ In-1 INa(x) -A(x)IPdx .- �1+�2+�3·

For j E N, and 1 < j < n, one has

INa (x) - fh(XW = 1�(!h(Xi) - !h(X))gi(Xf

I� (fh(Xi) -fh(X)) gi(X)

+ (A(Xj-l) -A(x)) gj-l(X) + (fh(Xj) -fh(X))9j(x)

n + � (fh(Xi) - fh(X)) gi(X) i=j+l

:; 4P-l (I� (fh(Xi) -fh(X))9i(X{ + I (fh(Xj-l) -fh(X)) gj-l(XW

+ 1(!h(Xj) - !hex)) gj(x)IP

+ ,t;, (h.(x,) -fh(.X)) g,(x) ') ,

which implies that

�l :; 4P-l � 1��, (I� (!h(Xi) -!hex)) g;(xf

+ I (fh(Xj-l) -fh(X))9j-l(XW + I (fh(Xj) -fh(X)) gj(xW

+ ,�x (h.('x.) -h.(x)) g,(x) ') dx, For �2, we obtain that

�2 = lx11�(fh(Xi)-!h(X))gi(XfdX

lx1 I (fh(a) - fh(X)) go (x) + (!h(Xl) -fh(X))gl(X)

+ � (A(x;) - fh(X)) g;(Xf dx

:; 3P-llx1 (I (!h(a) -A(x))go(xW + 1(!h(Xl) -fh(X)) gl(XW + I� (!h(x;) - !hex)) gi(xf) dx.

Also, we have

�3 :; 3P-ll�_1 (I (fh(b) -!h(x))gn(xW

+ 1(!h(Xn-l) -!h(X))gn-l(XW

+ I� (!h(Xi) -!hex)) 9i(X{) dx.

Therefore, combining �l' �2 and �3 gives


:; 4P-l �1��, 1�(!h(Xi) -fh(X))9i(Xf dx

+4P-l t lX, I (fh(Xj) - fh(X)) gj(xW dx j=l XJ-l


Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

Now, we estimate h, h, hand 14, respectively. We first estimate h Using a sim ilar method as [5] gives

�1��1 1� (fh(X;) -fh(X))9;(X{ dx < � 1��1 I� � lb-h If(t + h) -f(t) 1 dt

e-Alx-Xil I P x e-A1x-x

J-ll dx.

Set A* = �. Noting that for 0 SiS j -2 and Xj-1 <

x S Xj , one has Ix - xii - Ix - Xj-11 ;::: h, which implies that

for A > A* Thus,



11 b h I P

x h 1- If(t + h) -f(t)1 dt

< nP+1e-pn lb-h If(t + h) - f(tW dt < nP+1e-pn (w(f, h)p?

n-l x n 41-Ph = 2:= 1 J 2:= (A(Xi) -A(X))9i(X) dx

]=1 Xj-l i=j+1

n-1 (

X l+h ( n 1 (b-h S � JX J �� 2;

1 h Ja If(t + h) -f(t)1 dt



I ) P x e-A1x-x, I


Since j + 1 S i S nand Xj-1 S x S Xj, we have Ix -xii-Ix -Xjl;::: h, which implies that

e-A1x-x,1 < e-Ah

-e--A-:;'--I x -- -x', I -Thus, set A* = �, we obtain for A > A*

41-Ph S e-nPnP+1 (w(f,h)p?


By the same method, we have

41-P14 S (w(f,h)p?


IINa - fhllp S (8n2e-n + 8)w (f, h)p S 24w(f, h)p. This completes the proof of inequality (8), and the proof of Theorem 1 thus is finished.

7 Conclusions and Remarks

In this paper, we have analyzed the issue of the ap­proximation complexity for interpolation FNNs in the LP metric. A class of interpolation neural networks has been constructed to approximate LP integrable functions.

On one hand, our main result has implied the approxi­mation density for LP integrable functions defined on [a, b] by the constructed interpolation networks. The constructed networks Na(x) can approximate f(x) arbitrarily in the LP metric as long as the number of hidden nodes is sufficiently large.

On the other hand, the obtained results in this paper have clarified the relationship among the LP approximation speed, the num ber of hidden neurons, interpolation sam pIes and the smoothness of the target functions to be approxi­mated. We have concluded that the LP approximation speed of the constructed FNN s depends not only on the num ber of the hidden neurons, but also on the sm oothness of the target function.

In the development of the paper we are considering how to extend our constructive method to the case of multi­ple inputs, which seems not to be easy. This will be a new direction for deserving further study.


