[IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China...

6
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010 Approximate Interpolation by a Class of Neural Networ Lebese Metric Chunmei Ding, Yubo Yuan, Feilong Cao' Institute of Metrology and Computational Science, China Jiliang Universi, Hangzhou 310018, Zhejiang Province, P R China. E-: feilongcao@gmail.com Abstct In this paper, a class of approximate interpola- tion neural networks is constructed to approximate Lebesgue integrable functions. It is showed that the net- works can arbitrarily approximate any p-th Lebesgue integrable function in Lebesgue metric as long as the number of hidden nodes is sufficiently large. The re- lation among the approximation speed, the number of hidden nodes, the interpolation sample and the smooth- ness of the target function is also revealed by designing the Steklov mean function and the modulus of smooth- ness of f. The obtained results are helpful in study- ing the problem of approximation complexity of in- terpolation neural networks in Lebesgue metric. Keords: Neural networks; Interpolation; Approximation; Estimate of error; Lebesgue metric 1 Introduction Throughout is paper, Nand R denote the natural numbers set and e set of real numbers, respectively, and, for any positive integer d, R d denotes the d dimensional Euclidean space. Let S = {xo, Xl, ..., Xn} C R d denote a set of distct vectors, and {/o , i l , ..., fn} e R a set of real numbers. Then is called a set of interpolation sample, and {Xi } ?=O is called a node system of interpolation. If ere exists a function I : R d R such at f(Xi) = h i = 0,1, . . . , n, then we say that e function I is an exact interpolation of sampIe set (1) If there exists a fction 9 : R d R such 'Corresponding auor. 978-1-4244-6527-9/10/$26.00 ©2010 IEEE that Ig(Xi) - f il < E, i = O,l, ... ,n for positive real nber E, then we call 9 an E-approxate interpolation of sample set (1). According to common us- age, a sigmoidal fction a is defined in R with the follow- ing properties: lim a(t) = 1, and lim a(t) = O. (2) t�+= t�-= In recent years, interpolation problem of neal net- works has made great progress. We know that e sin- gle hidden layer feedforward neural networks (Fs) wi at most n + 1 neurons can learn n + 1 distinct samples (Xi , f i) (i = 0, 1,2,... , n) with zero error (exact terpo- lation), see in [19J, [21J, [31J and [33]. Ito and Saito [21J proved that if e activation function is continuous and non- decreasing sigmoidal fction, then e exact interpolation can be made wi inner weights Wj E S d - l , where S d - l is the it sphere in R d In [31J Pkus proved the same result but ¢ only needs to be continuous in R and not a polynomial. Shrivastava and Dasgupta [32J gave a proof for sigmoidal activation function ¢(x) := 1+!-". How- ever, it is more difficult to solve e exact interpolation net- works. So, ones rn to the sdy of approximate interpola- tion neural networks, which first were used in [33J as a tool to study the exact inteolation networks. It was proved in [33J that if arbitrary precision approximate terpolation exists in a linear space of functions, then an exact inter- polation can be obtained in that space. Furthermore, the fact "If ¢ is sigmoidal, continuous and ere exists a point c such at ¢' ( c) 0, en an interpolation problem wi 2n+ 1 samples can be approximated with arbitrary precision by a net with n + 1 neurons" was given. Recently, Llanas and Sainz [23J sdied e existence and consuction of E- approxate interpolation networks. They first considered that e activation fction ¢ is a nondecreasing sigmoidal function satisfying the condition (2) and gave a new and quantitative proof of the fact that n + 1 hidden neurons can 3134

Transcript of [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China...

Page 1: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

Approximate Interpolation by a Class of Neural Networks in Lebesgue Metric

Chunmei Ding, Yubo Yuan, Feilong Cao'

Institute of Metrology and Computational Science, China Jiliang University,

Hangzhou 310018, Zhejiang Province, P R China.

E-MIAL: [email protected]

Abstract In this paper, a class of approximate interpola­

tion neural networks is constructed to approximate

Lebesgue integrable functions. It is showed that the net­

works can arbitrarily approximate any p-th Lebesgue

integrable function in Lebesgue metric as long as the

number of hidden nodes is sufficiently large. The re­

lation among the approximation speed, the number of

hidden nodes, the interpolation sample and the smooth­

ness of the target function is also revealed by designing

the Steklov mean function and the modulus of smooth­

ness of f. The obtained results are helpful in study­

ing the problem of approximation complexity of in­

terpolation neural networks in Lebesgue metric.

Keywords: Neural networks; Interpolation; Approximation;

Estimate of error; Lebesgue metric

1 Introduction

Throughout this paper, Nand R denote the natural numbers set and the set of real numbers, respectively, and, for any positive integer d, Rd denotes the d dimensional

Euclidean space. Let S = {xo, Xl, ... , Xn} C R d denote a set of distinct vectors, and {/o, il, ... , fn} e R a set of real numbers. Then

is called a set of interpolation sample, and {Xi}?=O is called a node system of interpolation.

If there exists a function I : Rd --+ R such that

f(Xi) = h i = 0,1, . . . , n,

then we say that the function I is an exact interpolation of

sam pIe set (1) If there exists a function 9 : R d --+ R such

'Corresponding author.

978-1-4244-6527 -9/10/$26.00 ©201 0 IEEE

that

Ig(Xi) - fil < E, i = O,l, ... ,n

for positive real number E, then we call 9 an E-approximate interpolation of sample set (1). According to common us­age, a sigmoidal function a is defined in R with the follow­ing properties:

lim a(t) = 1, and lim a(t) = O. (2) t�+= t�-=

In recent years, interpolation problem of neural net­works has made great progress. We know that the sin­

gle hidden layer feedforward neural networks (FNNs) with at most n + 1 neurons can learn n + 1 distinct samples

(Xi, fi) (i = 0, 1,2, ... , n) with zero error (exact interpo­lation), see in [19J, [21J, [31J and [33]. Ito and Saito [21J proved that if the activation function is continuous and non­

decreasing sigmoidal function, then the exact interpolation can be made with inner weights Wj E Sd-l, where Sd-l

is the unit sphere in Rd In [31J Pinkus proved the same result but ¢ only needs to be continuous in R and not a polynomial. Shrivastava and Dasgupta [32J gave a proof

for sigmoidal activation function ¢(x) := 1+!-". How­ever, it is more difficult to solve the exact interpolation net­works. So, ones turn to the study of approximate interpola­tion neural networks, which first were used in [33J as a tool to study the exact interpolation networks. It was proved

in [33J that if arbitrary precision approximate interpolation exists in a linear space of functions, then an exact inter­polation can be obtained in that space. Furthermore, the fact "If ¢ is sigmoidal, continuous and there exists a point

c such that ¢' (c) "I 0, then an interpolation problem with 2n+ 1 samples can be approximated with arbitrary precision by a net with n + 1 neurons" was given. Recently, Llanas and Sainz [23J studied the existence and construction of E­approximate interpolation networks. They first considered

that the activation function ¢ is a nondecreasing sigmoidal function satisfying the condition (2) and gave a new and quantitative proof of the fact that n + 1 hidden neurons can

3134

Page 2: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

learn n + 1 distinct samples with zero error. Then, they in­troduced approximate interpolation networks, which do not require training and can approximately interpolate an arbi­trary set of distinct samples. We also notice that Llanas and Lantaron [24] considered the problem of Hermite interpola­tion of FNNs. However, it is natural to raise the following two questions: Can we replace the sigmoidal nondecreas­ing function used in [23] by other activation functions? Can we estimate the errors of approxim ation for constructed net­works?

2 A Class ofInterpolation Neural Networks

LetxO,xl, ... ,Xn be the data in Rd, p(a, b) = Ila-bl12 the Euclidean distance between the points a and b in R d, and parameter A := A(n) > ° depending on n. Then a class of functions gj : R d ---+ R, which can be used as activation function of FNNs, is defined by

e-AI'(x,xJ) gj(x):=gj(x,A):=

�n -A( )' j=O,l, ... , n. i=O e P x,x�

(3) And a linear combination of gj(x, A) then is defined by

n Nn(x) := :2::>jgj(x, A). ( 4)

j=O

Clearly, Nn(x) can be understood to be a FNN with four layers: the first layer is the input layer, the input is x(x E R d); the second layer is processing layer for computing val­uesp(x, xi),i = 0,1, . . . ,n, between input x and the proto­typical input points Xi, and it is as the input of the third layer that contains n + 1 neurons, gj(x, A) is activation function of the j-th neuron; the fourth layer is output layer, the out­put is N(x)

It is well-known that sigmoidal function rjJ(x) l+!-x, which usually be used as activation function in the

hidden layer of neural networks, is a logistic model. This model is an important one and has been widely used in bi­ology, demography and so one see [2], [17]). Naturally, the functions

efJ (x)

q,j(X):=�n efi(x)' j=0,1,2 ... , n ,=0

can be regarded as a multi-class generalization of the logis­tic model (see Section 10.6 in [14] ), which also was used as a regression model for the case of multi-class in the clas­sification problems. On the other hand, it follows from their structures that gj (x) contain the inform ation of the interpo­lation samples. The second layer of the network composed of gj (x) can be regarded as the processing layer and the in­put of the third layer, which is m ore convenient to the study of network interpolations.

In [3], the authors constructed exact and approximate interpolation networks by using the activation function gj (x) given by (3), and gave some estimations of conver­gence rate in unform metric.

3 Complexity of Approximation of Neural

Networks

The reason why FNNs are used in many areas is due to their universal approximation property, i.e., any contin­uous or integrable functions defined on a compact set can be approximated by a FNN with one hidden layer. This property can be essentially comprehended by the fact that FNNs are nonlinear parametric expressions for numerical functions. In connection with such paradigms there arise mainly three problems: density problem, complexity prob­lem and solution algorithm problem. The density problem was satisfactorily solved in the late 1980s and in the early 1990s. We refer to [10], [15], [20], [22], [6], [9], [7] and [8], where a graceful necessary and sufficient condition of the density theorem is obtained that the activation function 'f' is not a polynomial function. This result provides the the­oretical basis for selecting activation function on a bigger range.

The complexity of FNNs approximation mainly de­scribes the relationship among the topology structure of hid­den layer (such as the number of neuron and the value of weights), the approximation ability and the approximation rate. About complexity of FNNs, we refer to [13], [30], [16], [?], [12], [18], [27], [4].

In these studies, some upper estimations on approxi­mation rate were established under certain circumstances. Such a type of upper estimation results can, on the one hand, imply convergence of FNNs to the target functions, and, on the other hand, provide a quantitative estimation of how ac­curately the FNNs approximate the functions This paper addresses the LP complexity of approximation of interpola­tion networks form as (4).

4 LP Approximation by Neural Networks

More recently, some upper bounds of approximation error for FNNs in the uniform metric have been studied by Mhaskar and Micchelli [29], Maiorov and Meir [26], Makovoz [28J, Llanas and Sainz [23], Cao et al [3J, [4J, and so on. It has been obtained that FNN s with one hidden layer of sigmoidal nodes can achieve error of order O(n-a) (where ° < Q S 1 and n is the number of neurons in the hidden layer) for approximating various classes of suf­ficiently smooth functions in the uniform metric. However, an interesting question is: can we obtain the similar results for more general classes of functions besides continuous

3135

Page 3: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

ones? And how? One of our main purposes in this paper is to address the issue of approximation complexity when LP integrable functions are approximated by interpolation neural networks defined by (4) in the LP metric.

Compared with the unifonn metric, the £p(1 -<; p < +CXJ) metric has also its specific advantages and solid use in science and engineering. For example, given a tar­get function g : Rd --+ R and a set of sample points

x(l), x(2) , . .. , x(L), which are viewed as input signals of the approximating neural network N, define the empirical error EL as

L EL(N) := ± � IN(x(i») -g(x(i»)IP , 1 -<; p < +CXJ.

i=l Then, it is known that whenever the input signals are gener­ated from a stationary source with distribution fL, the famous ergodic theorem (see [1]) holds and by this theorem it can be shown that the average of EL(N) will converge to the actual error

E:= r IN(x) -g(xW dfL. JR'

This underlies the feasibility of training the neural networks with a finite num ber of samples. Consistent with such theo­retical treatment, studying approximation complexity in the LP metric then becomes very promising and significant.

It is well known that given a desired approximation er­ror E, for any f E LP[a, b], 1 -<; p < +CXJ, there exists a Nn(x) such that

where [a, b] c R by the universal approximation property. This fact illustrates the existence of Nn(x) for approximat­ing any LP integrable function. In order to further reveal its approximation complexity, a novel constructive method which is mainly based on the Steklov mean function and the modulus of smoothness of f will be introduced in the paper. By the novel approach, not only will the size of the Nn(x) fonn as (4)(i.e., the number of neurons n in the hidden layer) be estimated given E and f, but also will the weights Ci be obtained by calculating the integrals of f over differ­ent subintervals of [a, b] (instead of resulting from some dis­crete values of f). Hence, the specified topology of Nn(x) will be detenn ined according to the results to be established mentioned above.

The paper is organized as follows. In the next section, we introduce some notations and the main results of this paper. In section 3, we show the main results of this paper. We conclude the paper with some useful conclusions and rem arks in the last section

5 The Notations and Main Results

Denote by LP[a, b] (1 -<; p < +CXJ) the space of p-th Lebesgue integrable functions defined on [a, b], whose nonn is usually defined as ( b ) l!p

II flip := lifO lip := l lf(t)IPdt < +CXJ.

For f E £P[a,b](l -<; p < +CXJ), we will use the mod­ulus of smoothness w(f, o)p of f, defined by

w(f, o)p:= sup r- If(t+h)-f(t)IPdt , ( b h ) l!p

O<h<:,8 Ja 1 -<; p < +CXJ,

which is usually a measurement tool of approximation er­ror. This modulus is also used to measure the smoothness of a function and approximation accuracy in approximation theory and Fourier analysis (see [25J, [11]).

It is clear that lim5--+o w(f, o)p = 0 and w(f, .A6)p -<; (A + l)w (f, o)p for any real num ber A 2': O. The function

f is called (M, a)-Lipschitz continuous (0 < a -<; 1) and is written as f E Lip(M, a), if, and only if there exists a constant M > 0, such that

w(f, o)p -<; Moa. For 0 < h -<; b - a, and f E LP[a, b], 1 -<; p < CXJ, we

define the Steklov mean function as

fh(X) := { � f�+h f(t)dt, a -<; x < b -h; (5)

Ii fb-h f(t)dt, b -h -<; x -<; b. An error between the Steklov mean function !h and the function f in LP metric is (see [3])

(6)

which will be proved in the next section. We summarize the main results we have obtained on

the approximation complexity for LP integrable functions approximated by interpolation neural networks fonn as (4) as follows.

Theorem 1 Let f E £P[a, b], 1 -<; p < +CXJ, Xi =

a+i b;,;a, fi = !h(Xi), andh = b;,;a fori = 0,1, . . . ,n.

Thenfor the interpolation sample set {(Xi, /;)};=o, there exists a constant A * > 0, and an approximate interpolation network defined by

such that

n Na(x) := � fh(Xj)gj(x),

j=O

( b - a) IlfO-NaOllp-<;27w f'-n- P for A > A*.

3136

(7)

Page 4: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

6 The Proof of Main Results

In this section we give the proof of Theorem 1 and The­orem 2.

Now we begin to prove Theorem 1. Divide [a , b] into a = Xo < Xl < ... < Xn = b, where X; := a + ih, h := b�a, Let f; := A(x;),i = 0,1,2, . .. ,n. Then we can construct an approximate interpolation network for a posi­tive constant A := A(n) depending on n

n Na(x) := � j;g;(x, A(n)),

;=0

which satisfies Theorem 1. Noting that

IINa -flip:; IINa -Allp + Ilfh -flip, and using (6), we only need to prove

IINa -Allp:; 24w(f,h)p. (8)

So, we use notation (5) and obtain

IINa -fhll� = lb INa (x) -fh(X)lPdx

n-l x � L J INa(x) -A(x)lPdx

j=2 X]_l

+ le1 INa(x) - A(x)lPdx b

+ In-1 INa(x) -A(x)IPdx .- �1+�2+�3·

For j E N, and 1 < j < n, one has

INa (x) - fh(XW = 1�(!h(Xi) - !h(X))gi(Xf

I� (fh(Xi) -fh(X)) gi(X)

+ (A(Xj-l) -A(x)) gj-l(X) + (fh(Xj) -fh(X))9j(x)

n + � (fh(Xi) - fh(X)) gi(X) i=j+l

:; 4P-l (I� (fh(Xi) -fh(X))9i(X{ + I (fh(Xj-l) -fh(X)) gj-l(XW

+ 1(!h(Xj) - !hex)) gj(x)IP

+ ,t;, (h.(x,) -fh(.X)) g,(x) ') ,

which implies that

�l :; 4P-l � 1��, (I� (!h(Xi) -!hex)) g;(xf

+ I (fh(Xj-l) -fh(X))9j-l(XW + I (fh(Xj) -fh(X)) gj(xW

+ ,�x (h.('x.) -h.(x)) g,(x) ') dx, For �2, we obtain that

�2 = lx11�(fh(Xi)-!h(X))gi(XfdX

lx1 I (fh(a) - fh(X)) go (x) + (!h(Xl) -fh(X))gl(X)

+ � (A(x;) - fh(X)) g;(Xf dx

:; 3P-llx1 (I (!h(a) -A(x))go(xW + 1(!h(Xl) -fh(X)) gl(XW + I� (!h(x;) - !hex)) gi(xf) dx.

Also, we have

�3 :; 3P-ll�_1 (I (fh(b) -!h(x))gn(xW

+ 1(!h(Xn-l) -!h(X))gn-l(XW

+ I� (!h(Xi) -!hex)) 9i(X{) dx.

Therefore, combining �l' �2 and �3 gives

IINa - AII�

:; 4P-l �1��, 1�(!h(Xi) -fh(X))9i(Xf dx

+4P-l t lX, I (fh(Xj) - fh(X)) gj(xW dx j=l XJ-l

3137

Page 5: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

Now, we estimate h, h, hand 14, respectively. We first estimate h Using a sim ilar method as [5] gives

�1��1 1� (fh(X;) -fh(X))9;(X{ dx < � 1��1 I� � lb-h If(t + h) -f(t) 1 dt

e-Alx-Xil I P x e-A1x-x

J-ll dx.

Set A* = �. Noting that for 0 SiS j -2 and Xj-1 <

x S Xj , one has Ix - xii - Ix - Xj-11 ;::: h, which implies that

for A > A* Thus,

Similarly,

n

11 b h I P

x h 1- If(t + h) -f(t)1 dt

< nP+1e-pn lb-h If(t + h) - f(tW dt < nP+1e-pn (w(f, h)p?

n-l x n 41-Ph = 2:= 1 J 2:= (A(Xi) -A(X))9i(X) dx

]=1 Xj-l i=j+1

n-1 (

X l+h ( n 1 (b-h S � JX J �� 2;

1 h Ja If(t + h) -f(t)1 dt

e-A1x-x,

.

I ) P x e-A1x-x, I

dx.

Since j + 1 S i S nand Xj-1 S x S Xj, we have Ix -xii-Ix -Xjl;::: h, which implies that

e-A1x-x,1 < e-Ah

-e--A-:;'--I x -- -x', I -Thus, set A* = �, we obtain for A > A*

41-Ph S e-nPnP+1 (w(f,h)p?

Also,

By the same method, we have

41-P14 S (w(f,h)p?

Therefore

IINa - fhllp S (8n2e-n + 8)w (f, h)p S 24w(f, h)p. This completes the proof of inequality (8), and the proof of Theorem 1 thus is finished.

7 Conclusions and Remarks

In this paper, we have analyzed the issue of the ap­proximation complexity for interpolation FNNs in the LP metric. A class of interpolation neural networks has been constructed to approximate LP integrable functions.

On one hand, our main result has implied the approxi­mation density for LP integrable functions defined on [a, b] by the constructed interpolation networks. The constructed networks Na(x) can approximate f(x) arbitrarily in the LP metric as long as the number of hidden nodes is sufficiently large.

On the other hand, the obtained results in this paper have clarified the relationship among the LP approximation speed, the num ber of hidden neurons, interpolation sam pIes and the smoothness of the target functions to be approxi­mated. We have concluded that the LP approximation speed of the constructed FNN s depends not only on the num ber of the hidden neurons, but also on the sm oothness of the target function.

In the development of the paper we are considering how to extend our constructive method to the case of multi­ple inputs, which seems not to be easy. This will be a new direction for deserving further study.

Acknowledgment

This research was supported by the National Natural Science Foundation of China(No 60873206) and the Natu­ral Science Foundation of Zhejiang Province of China (No. Y7080235).

References

[1] J. Bergh, L Lofstrom, Interpolation Space, Springer­Verlag, Berlin, 1976.

[2] F Brauer, C. Castillo-Chavez, lvIathematical models in population biology and epidemiology, Springer­Verlag, New York, pp. 8-9,200 1.

[3] F L Cao, Y. Q. Zhang, Z. R. He, Interpolation and Rates of Convergence for a Class of Neural Networks, AppliedMath. Modelling, 33 (2009), 144 1-1456.

[4] F L Cao, T F Xie, Z B. Xu, The estimate for ap­proximation error of neural networks: A constructive approach, Neurocomputing, 71 (2008), 626-630.

3138

Page 6: [IEEE 2010 International Conference on Machine Learning and Cybernetics (ICMLC) - Qingdao, China (2010.07.11-2010.07.14)] 2010 International Conference on Machine Learning and Cybernetics

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

[5J F 1. Cao, R. Zhang, The errors of approximation for feedforward neural networks in the Lp metric, Math. Computer Modelling, 49 (2009), 1563-1572.

[6J C. K Chui, X Li, Approximation by ridge functions and neural networks with one hidden layer, J. Approx. Theory, 70(1992),131-141.

[7J T P Chen, H. Chen, Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks, IEEE Trans. Neural Networks, 6(1995),904-910.

[8J T P Chen, H. Chen, Universal approximation to non­linear operators by neural networks with arbitrary ac­tivation functions and its application to dynamical sys­tems, IEEE Trans. Neural Networks, 6(1995), 911-917.

[9J T P Chen, Approximation problems in system identi­fication with neural networks, Science in China (series Aj, 24(1)(1994),1-7.

10J G. Cybenko, Approximation by superpositions of sig­m oidal function, Math. ofC ontrol Signals, and System, 2(1989), 303-314.

11 J Z. Ditzian, V Totik, Moduli of smoothness, Springer­Verlag, BerlinlNew York, 1987.

12J S. Ferrari, R F Stengel, Smooth function approxima­tion using neural networks, IEEE Trans. Neural Net­works, 16(2005), 24-38.

13J A Guillen, H. Porn ares, L Rojas, et aI, Studying pos­sibility in a clustering algorithm for RB FNN design for function approximation, Neural Computing & Ap­plications, 17(1)(2008), 75-89.

14J T Hastie, R Tibshirani, J Friedman, The elements of statistical learning: data minning,inference, and pre­diction, Springer-Verlag, New York, pp. 50-240,2001,

15J H. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, 4(1991), 251-257.

16J M. Z. Hou, X 1. Han, Y X Gan, Constructive ap­proximation to real function by wavelet neural net­works, Neural Computing & Applications, 2008(in press)

17J N. Hritonenko, Y Yatsenko, Mathematical modeling in economics, ecology and the environment, Reprint, Science Press, Beijing, pp. 92-93, 2006.

18J G. B. Huang, Q. Y Zhu, C. K Siew, Extreme learning machine: Theory and applications, Neurocomputing, 70(2006), 489-50 1.

[19J Y Ito, Independence of unscaled basis functions and finite mappings by neural networks, Math. Sci., 26(2001), 117-126.

[20J Y Ito, Approximation of functions on a compact set by finite sums of sigmoid function without scaling, Neu­ral Networks, 4(1991), 817-826.

[2 1 J Y Ito, K Saito, Superposition of linearly independent functions and finite mappings by neural networks,

Math. Sci., 21(1996), 27-33.

[22J M. Leshno, VY Lin, A Pinks, et a!, Schocken, Multi­layer feedforward networks with a nonpolynomial ac­tivation function can approximate any function, Neu­ral Networks, 6( 1993), 86 1-867.

[23J B. Llanas, F J Sainz, Constructive approximate in­terpolation by neural networks, J. Comput. Applied

Math., 188(2006),283-308.

[24J B. Llanas, S. Lantar6n, Hermite interpolation by neu­ral networks, Applied Mathematics and Computation, 191(2)(2007),429-439.

[25J G. G. Lorentz, Approximation of functions, New York: Rinehart and Winston, 1966.

[26J V Maiorov, R S. Meir, Approximation bounds for smooth functions in C(Rd) by neural and mixture net­works, IEEE Trans. Neural Networks, 9(1998), 969-978.

[27J V Maiorov, Approximation by neural networks and learning theory, Journal of Complexity, 22(2006), 102-117.

[28J Y Makovoz, Uniform approximation by neural net­works,J. Approx. Theory, 95(1998), 215-228.

[29J H. N. Mhaskar, C. A Miccheli, Degree of approxi­mation by neural networks with a single hidden layer, Adv. AppliedMath., 16(1995), 151-183.

[30J B. Martin, N. Andreas, Analysis of Tikhonov regu­larization for function approximation by neural net­works, Neural Networks, 16(2003), 79-90.

[31J A Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, (1999), 143-195.

[32J Y Shrivatava, S. Dasgupta, Neural networks for exact matching of functions on a discrete domain, in Pro­ceedings of the 29th IEEE Conference on Decision and Control, Honolulu, 1990,pp. 1719-1724.

[33J KD. Sontag, Feddforward nets for interpolation and classification,J. Compo Syst Sci., 45(1992), 20-48.

3139