[IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China...
Transcript of [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China...
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
Approximate Interpolation by a Class of Neural Networks in Lebesgue Metric
Chunmei Ding, Yubo Yuan, Feilong Cao'
Institute of Metrology and Computational Science, China Jiliang University,
Hangzhou 310018, Zhejiang Province, P R China.
E-MIAL: [email protected]
Abstract In this paper, a class of approximate interpola
tion neural networks is constructed to approximate
Lebesgue integrable functions. It is showed that the net
works can arbitrarily approximate any p-th Lebesgue
integrable function in Lebesgue metric as long as the
number of hidden nodes is sufficiently large. The re
lation among the approximation speed, the number of
hidden nodes, the interpolation sample and the smooth
ness of the target function is also revealed by designing
the Steklov mean function and the modulus of smooth
ness of f. The obtained results are helpful in study
ing the problem of approximation complexity of in
terpolation neural networks in Lebesgue metric.
Keywords: Neural networks; Interpolation; Approximation;
Estimate of error; Lebesgue metric
1 Introduction
Throughout this paper, Nand R denote the natural numbers set and the set of real numbers, respectively, and, for any positive integer d, Rd denotes the d dimensional
Euclidean space. Let S = {xo, Xl, ... , Xn} C R d denote a set of distinct vectors, and {/o, il, ... , fn} e R a set of real numbers. Then
is called a set of interpolation sample, and {Xi}?=O is called a node system of interpolation.
If there exists a function I : Rd --+ R such that
f(Xi) = h i = 0,1, . . . , n,
then we say that the function I is an exact interpolation of
sam pIe set (1) If there exists a function 9 : R d --+ R such
'Corresponding author.
978-1-4244-6527 -9/10/$26.00 ©201 0 IEEE
that
Ig(Xi) - fil < E, i = O,l, ... ,n
for positive real number E, then we call 9 an E-approximate interpolation of sample set (1). According to common usage, a sigmoidal function a is defined in R with the following properties:
lim a(t) = 1, and lim a(t) = O. (2) t�+= t�-=
In recent years, interpolation problem of neural networks has made great progress. We know that the sin
gle hidden layer feedforward neural networks (FNNs) with at most n + 1 neurons can learn n + 1 distinct samples
(Xi, fi) (i = 0, 1,2, ... , n) with zero error (exact interpolation), see in [19J, [21J, [31J and [33]. Ito and Saito [21J proved that if the activation function is continuous and non
decreasing sigmoidal function, then the exact interpolation can be made with inner weights Wj E Sd-l, where Sd-l
is the unit sphere in Rd In [31J Pinkus proved the same result but ¢ only needs to be continuous in R and not a polynomial. Shrivastava and Dasgupta [32J gave a proof
for sigmoidal activation function ¢(x) := 1+!-". However, it is more difficult to solve the exact interpolation networks. So, ones turn to the study of approximate interpolation neural networks, which first were used in [33J as a tool to study the exact interpolation networks. It was proved
in [33J that if arbitrary precision approximate interpolation exists in a linear space of functions, then an exact interpolation can be obtained in that space. Furthermore, the fact "If ¢ is sigmoidal, continuous and there exists a point
c such that ¢' (c) "I 0, then an interpolation problem with 2n+ 1 samples can be approximated with arbitrary precision by a net with n + 1 neurons" was given. Recently, Llanas and Sainz [23J studied the existence and construction of Eapproximate interpolation networks. They first considered
that the activation function ¢ is a nondecreasing sigmoidal function satisfying the condition (2) and gave a new and quantitative proof of the fact that n + 1 hidden neurons can
3134
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
learn n + 1 distinct samples with zero error. Then, they introduced approximate interpolation networks, which do not require training and can approximately interpolate an arbitrary set of distinct samples. We also notice that Llanas and Lantaron [24] considered the problem of Hermite interpolation of FNNs. However, it is natural to raise the following two questions: Can we replace the sigmoidal nondecreasing function used in [23] by other activation functions? Can we estimate the errors of approxim ation for constructed networks?
2 A Class ofInterpolation Neural Networks
LetxO,xl, ... ,Xn be the data in Rd, p(a, b) = Ila-bl12 the Euclidean distance between the points a and b in R d, and parameter A := A(n) > ° depending on n. Then a class of functions gj : R d ---+ R, which can be used as activation function of FNNs, is defined by
e-AI'(x,xJ) gj(x):=gj(x,A):=
�n -A( )' j=O,l, ... , n. i=O e P x,x�
(3) And a linear combination of gj(x, A) then is defined by
n Nn(x) := :2::>jgj(x, A). ( 4)
j=O
Clearly, Nn(x) can be understood to be a FNN with four layers: the first layer is the input layer, the input is x(x E R d); the second layer is processing layer for computing valuesp(x, xi),i = 0,1, . . . ,n, between input x and the prototypical input points Xi, and it is as the input of the third layer that contains n + 1 neurons, gj(x, A) is activation function of the j-th neuron; the fourth layer is output layer, the output is N(x)
It is well-known that sigmoidal function rjJ(x) l+!-x, which usually be used as activation function in the
hidden layer of neural networks, is a logistic model. This model is an important one and has been widely used in biology, demography and so one see [2], [17]). Naturally, the functions
efJ (x)
q,j(X):=�n efi(x)' j=0,1,2 ... , n ,=0
can be regarded as a multi-class generalization of the logistic model (see Section 10.6 in [14] ), which also was used as a regression model for the case of multi-class in the classification problems. On the other hand, it follows from their structures that gj (x) contain the inform ation of the interpolation samples. The second layer of the network composed of gj (x) can be regarded as the processing layer and the input of the third layer, which is m ore convenient to the study of network interpolations.
In [3], the authors constructed exact and approximate interpolation networks by using the activation function gj (x) given by (3), and gave some estimations of convergence rate in unform metric.
3 Complexity of Approximation of Neural
Networks
The reason why FNNs are used in many areas is due to their universal approximation property, i.e., any continuous or integrable functions defined on a compact set can be approximated by a FNN with one hidden layer. This property can be essentially comprehended by the fact that FNNs are nonlinear parametric expressions for numerical functions. In connection with such paradigms there arise mainly three problems: density problem, complexity problem and solution algorithm problem. The density problem was satisfactorily solved in the late 1980s and in the early 1990s. We refer to [10], [15], [20], [22], [6], [9], [7] and [8], where a graceful necessary and sufficient condition of the density theorem is obtained that the activation function 'f' is not a polynomial function. This result provides the theoretical basis for selecting activation function on a bigger range.
The complexity of FNNs approximation mainly describes the relationship among the topology structure of hidden layer (such as the number of neuron and the value of weights), the approximation ability and the approximation rate. About complexity of FNNs, we refer to [13], [30], [16], [?], [12], [18], [27], [4].
In these studies, some upper estimations on approximation rate were established under certain circumstances. Such a type of upper estimation results can, on the one hand, imply convergence of FNNs to the target functions, and, on the other hand, provide a quantitative estimation of how accurately the FNNs approximate the functions This paper addresses the LP complexity of approximation of interpolation networks form as (4).
4 LP Approximation by Neural Networks
More recently, some upper bounds of approximation error for FNNs in the uniform metric have been studied by Mhaskar and Micchelli [29], Maiorov and Meir [26], Makovoz [28J, Llanas and Sainz [23], Cao et al [3J, [4J, and so on. It has been obtained that FNN s with one hidden layer of sigmoidal nodes can achieve error of order O(n-a) (where ° < Q S 1 and n is the number of neurons in the hidden layer) for approximating various classes of sufficiently smooth functions in the uniform metric. However, an interesting question is: can we obtain the similar results for more general classes of functions besides continuous
3135
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
ones? And how? One of our main purposes in this paper is to address the issue of approximation complexity when LP integrable functions are approximated by interpolation neural networks defined by (4) in the LP metric.
Compared with the unifonn metric, the £p(1 -<; p < +CXJ) metric has also its specific advantages and solid use in science and engineering. For example, given a target function g : Rd --+ R and a set of sample points
x(l), x(2) , . .. , x(L), which are viewed as input signals of the approximating neural network N, define the empirical error EL as
L EL(N) := ± � IN(x(i») -g(x(i»)IP , 1 -<; p < +CXJ.
i=l Then, it is known that whenever the input signals are generated from a stationary source with distribution fL, the famous ergodic theorem (see [1]) holds and by this theorem it can be shown that the average of EL(N) will converge to the actual error
E:= r IN(x) -g(xW dfL. JR'
This underlies the feasibility of training the neural networks with a finite num ber of samples. Consistent with such theoretical treatment, studying approximation complexity in the LP metric then becomes very promising and significant.
It is well known that given a desired approximation error E, for any f E LP[a, b], 1 -<; p < +CXJ, there exists a Nn(x) such that
where [a, b] c R by the universal approximation property. This fact illustrates the existence of Nn(x) for approximating any LP integrable function. In order to further reveal its approximation complexity, a novel constructive method which is mainly based on the Steklov mean function and the modulus of smoothness of f will be introduced in the paper. By the novel approach, not only will the size of the Nn(x) fonn as (4)(i.e., the number of neurons n in the hidden layer) be estimated given E and f, but also will the weights Ci be obtained by calculating the integrals of f over different subintervals of [a, b] (instead of resulting from some discrete values of f). Hence, the specified topology of Nn(x) will be detenn ined according to the results to be established mentioned above.
The paper is organized as follows. In the next section, we introduce some notations and the main results of this paper. In section 3, we show the main results of this paper. We conclude the paper with some useful conclusions and rem arks in the last section
5 The Notations and Main Results
Denote by LP[a, b] (1 -<; p < +CXJ) the space of p-th Lebesgue integrable functions defined on [a, b], whose nonn is usually defined as ( b ) l!p
II flip := lifO lip := l lf(t)IPdt < +CXJ.
For f E £P[a,b](l -<; p < +CXJ), we will use the modulus of smoothness w(f, o)p of f, defined by
w(f, o)p:= sup r- If(t+h)-f(t)IPdt , ( b h ) l!p
O<h<:,8 Ja 1 -<; p < +CXJ,
which is usually a measurement tool of approximation error. This modulus is also used to measure the smoothness of a function and approximation accuracy in approximation theory and Fourier analysis (see [25J, [11]).
It is clear that lim5--+o w(f, o)p = 0 and w(f, .A6)p -<; (A + l)w (f, o)p for any real num ber A 2': O. The function
f is called (M, a)-Lipschitz continuous (0 < a -<; 1) and is written as f E Lip(M, a), if, and only if there exists a constant M > 0, such that
w(f, o)p -<; Moa. For 0 < h -<; b - a, and f E LP[a, b], 1 -<; p < CXJ, we
define the Steklov mean function as
fh(X) := { � f�+h f(t)dt, a -<; x < b -h; (5)
Ii fb-h f(t)dt, b -h -<; x -<; b. An error between the Steklov mean function !h and the function f in LP metric is (see [3])
(6)
which will be proved in the next section. We summarize the main results we have obtained on
the approximation complexity for LP integrable functions approximated by interpolation neural networks fonn as (4) as follows.
Theorem 1 Let f E £P[a, b], 1 -<; p < +CXJ, Xi =
a+i b;,;a, fi = !h(Xi), andh = b;,;a fori = 0,1, . . . ,n.
Thenfor the interpolation sample set {(Xi, /;)};=o, there exists a constant A * > 0, and an approximate interpolation network defined by
such that
n Na(x) := � fh(Xj)gj(x),
j=O
( b - a) IlfO-NaOllp-<;27w f'-n- P for A > A*.
3136
(7)
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
6 The Proof of Main Results
In this section we give the proof of Theorem 1 and Theorem 2.
Now we begin to prove Theorem 1. Divide [a , b] into a = Xo < Xl < ... < Xn = b, where X; := a + ih, h := b�a, Let f; := A(x;),i = 0,1,2, . .. ,n. Then we can construct an approximate interpolation network for a positive constant A := A(n) depending on n
n Na(x) := � j;g;(x, A(n)),
;=0
which satisfies Theorem 1. Noting that
IINa -flip:; IINa -Allp + Ilfh -flip, and using (6), we only need to prove
IINa -Allp:; 24w(f,h)p. (8)
So, we use notation (5) and obtain
IINa -fhll� = lb INa (x) -fh(X)lPdx
n-l x � L J INa(x) -A(x)lPdx
j=2 X]_l
+ le1 INa(x) - A(x)lPdx b
+ In-1 INa(x) -A(x)IPdx .- �1+�2+�3·
For j E N, and 1 < j < n, one has
INa (x) - fh(XW = 1�(!h(Xi) - !h(X))gi(Xf
I� (fh(Xi) -fh(X)) gi(X)
+ (A(Xj-l) -A(x)) gj-l(X) + (fh(Xj) -fh(X))9j(x)
n + � (fh(Xi) - fh(X)) gi(X) i=j+l
:; 4P-l (I� (fh(Xi) -fh(X))9i(X{ + I (fh(Xj-l) -fh(X)) gj-l(XW
+ 1(!h(Xj) - !hex)) gj(x)IP
+ ,t;, (h.(x,) -fh(.X)) g,(x) ') ,
which implies that
�l :; 4P-l � 1��, (I� (!h(Xi) -!hex)) g;(xf
+ I (fh(Xj-l) -fh(X))9j-l(XW + I (fh(Xj) -fh(X)) gj(xW
+ ,�x (h.('x.) -h.(x)) g,(x) ') dx, For �2, we obtain that
�2 = lx11�(fh(Xi)-!h(X))gi(XfdX
lx1 I (fh(a) - fh(X)) go (x) + (!h(Xl) -fh(X))gl(X)
+ � (A(x;) - fh(X)) g;(Xf dx
:; 3P-llx1 (I (!h(a) -A(x))go(xW + 1(!h(Xl) -fh(X)) gl(XW + I� (!h(x;) - !hex)) gi(xf) dx.
Also, we have
�3 :; 3P-ll�_1 (I (fh(b) -!h(x))gn(xW
+ 1(!h(Xn-l) -!h(X))gn-l(XW
+ I� (!h(Xi) -!hex)) 9i(X{) dx.
Therefore, combining �l' �2 and �3 gives
IINa - AII�
:; 4P-l �1��, 1�(!h(Xi) -fh(X))9i(Xf dx
+4P-l t lX, I (fh(Xj) - fh(X)) gj(xW dx j=l XJ-l
3137
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
Now, we estimate h, h, hand 14, respectively. We first estimate h Using a sim ilar method as [5] gives
�1��1 1� (fh(X;) -fh(X))9;(X{ dx < � 1��1 I� � lb-h If(t + h) -f(t) 1 dt
e-Alx-Xil I P x e-A1x-x
J-ll dx.
Set A* = �. Noting that for 0 SiS j -2 and Xj-1 <
x S Xj , one has Ix - xii - Ix - Xj-11 ;::: h, which implies that
for A > A* Thus,
Similarly,
n
11 b h I P
x h 1- If(t + h) -f(t)1 dt
< nP+1e-pn lb-h If(t + h) - f(tW dt < nP+1e-pn (w(f, h)p?
n-l x n 41-Ph = 2:= 1 J 2:= (A(Xi) -A(X))9i(X) dx
]=1 Xj-l i=j+1
n-1 (
X l+h ( n 1 (b-h S � JX J �� 2;
1 h Ja If(t + h) -f(t)1 dt
e-A1x-x,
.
I ) P x e-A1x-x, I
dx.
Since j + 1 S i S nand Xj-1 S x S Xj, we have Ix -xii-Ix -Xjl;::: h, which implies that
e-A1x-x,1 < e-Ah
-e--A-:;'--I x -- -x', I -Thus, set A* = �, we obtain for A > A*
41-Ph S e-nPnP+1 (w(f,h)p?
Also,
By the same method, we have
41-P14 S (w(f,h)p?
Therefore
IINa - fhllp S (8n2e-n + 8)w (f, h)p S 24w(f, h)p. This completes the proof of inequality (8), and the proof of Theorem 1 thus is finished.
7 Conclusions and Remarks
In this paper, we have analyzed the issue of the approximation complexity for interpolation FNNs in the LP metric. A class of interpolation neural networks has been constructed to approximate LP integrable functions.
On one hand, our main result has implied the approximation density for LP integrable functions defined on [a, b] by the constructed interpolation networks. The constructed networks Na(x) can approximate f(x) arbitrarily in the LP metric as long as the number of hidden nodes is sufficiently large.
On the other hand, the obtained results in this paper have clarified the relationship among the LP approximation speed, the num ber of hidden neurons, interpolation sam pIes and the smoothness of the target functions to be approximated. We have concluded that the LP approximation speed of the constructed FNN s depends not only on the num ber of the hidden neurons, but also on the sm oothness of the target function.
In the development of the paper we are considering how to extend our constructive method to the case of multiple inputs, which seems not to be easy. This will be a new direction for deserving further study.
Acknowledgment
This research was supported by the National Natural Science Foundation of China(No 60873206) and the Natural Science Foundation of Zhejiang Province of China (No. Y7080235).
References
[1] J. Bergh, L Lofstrom, Interpolation Space, SpringerVerlag, Berlin, 1976.
[2] F Brauer, C. Castillo-Chavez, lvIathematical models in population biology and epidemiology, SpringerVerlag, New York, pp. 8-9,200 1.
[3] F L Cao, Y. Q. Zhang, Z. R. He, Interpolation and Rates of Convergence for a Class of Neural Networks, AppliedMath. Modelling, 33 (2009), 144 1-1456.
[4] F L Cao, T F Xie, Z B. Xu, The estimate for approximation error of neural networks: A constructive approach, Neurocomputing, 71 (2008), 626-630.
3138
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010
[5J F 1. Cao, R. Zhang, The errors of approximation for feedforward neural networks in the Lp metric, Math. Computer Modelling, 49 (2009), 1563-1572.
[6J C. K Chui, X Li, Approximation by ridge functions and neural networks with one hidden layer, J. Approx. Theory, 70(1992),131-141.
[7J T P Chen, H. Chen, Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks, IEEE Trans. Neural Networks, 6(1995),904-910.
[8J T P Chen, H. Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems, IEEE Trans. Neural Networks, 6(1995), 911-917.
[9J T P Chen, Approximation problems in system identification with neural networks, Science in China (series Aj, 24(1)(1994),1-7.
10J G. Cybenko, Approximation by superpositions of sigm oidal function, Math. ofC ontrol Signals, and System, 2(1989), 303-314.
11 J Z. Ditzian, V Totik, Moduli of smoothness, SpringerVerlag, BerlinlNew York, 1987.
12J S. Ferrari, R F Stengel, Smooth function approximation using neural networks, IEEE Trans. Neural Networks, 16(2005), 24-38.
13J A Guillen, H. Porn ares, L Rojas, et aI, Studying possibility in a clustering algorithm for RB FNN design for function approximation, Neural Computing & Applications, 17(1)(2008), 75-89.
14J T Hastie, R Tibshirani, J Friedman, The elements of statistical learning: data minning,inference, and prediction, Springer-Verlag, New York, pp. 50-240,2001,
15J H. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, 4(1991), 251-257.
16J M. Z. Hou, X 1. Han, Y X Gan, Constructive approximation to real function by wavelet neural networks, Neural Computing & Applications, 2008(in press)
17J N. Hritonenko, Y Yatsenko, Mathematical modeling in economics, ecology and the environment, Reprint, Science Press, Beijing, pp. 92-93, 2006.
18J G. B. Huang, Q. Y Zhu, C. K Siew, Extreme learning machine: Theory and applications, Neurocomputing, 70(2006), 489-50 1.
[19J Y Ito, Independence of unscaled basis functions and finite mappings by neural networks, Math. Sci., 26(2001), 117-126.
[20J Y Ito, Approximation of functions on a compact set by finite sums of sigmoid function without scaling, Neural Networks, 4(1991), 817-826.
[2 1 J Y Ito, K Saito, Superposition of linearly independent functions and finite mappings by neural networks,
Math. Sci., 21(1996), 27-33.
[22J M. Leshno, VY Lin, A Pinks, et a!, Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, 6( 1993), 86 1-867.
[23J B. Llanas, F J Sainz, Constructive approximate interpolation by neural networks, J. Comput. Applied
Math., 188(2006),283-308.
[24J B. Llanas, S. Lantar6n, Hermite interpolation by neural networks, Applied Mathematics and Computation, 191(2)(2007),429-439.
[25J G. G. Lorentz, Approximation of functions, New York: Rinehart and Winston, 1966.
[26J V Maiorov, R S. Meir, Approximation bounds for smooth functions in C(Rd) by neural and mixture networks, IEEE Trans. Neural Networks, 9(1998), 969-978.
[27J V Maiorov, Approximation by neural networks and learning theory, Journal of Complexity, 22(2006), 102-117.
[28J Y Makovoz, Uniform approximation by neural networks,J. Approx. Theory, 95(1998), 215-228.
[29J H. N. Mhaskar, C. A Miccheli, Degree of approximation by neural networks with a single hidden layer, Adv. AppliedMath., 16(1995), 151-183.
[30J B. Martin, N. Andreas, Analysis of Tikhonov regularization for function approximation by neural networks, Neural Networks, 16(2003), 79-90.
[31J A Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, (1999), 143-195.
[32J Y Shrivatava, S. Dasgupta, Neural networks for exact matching of functions on a discrete domain, in Proceedings of the 29th IEEE Conference on Decision and Control, Honolulu, 1990,pp. 1719-1724.
[33J KD. Sontag, Feddforward nets for interpolation and classification,J. Compo Syst Sci., 45(1992), 20-48.
3139