Fully Complex Multi-Layer Perceptron Network for Nonlinear Signal Processing

15
Journal of VLSI Signal Processing 32, 29–43, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Fully Complex Multi-Layer Perceptron Network for Nonlinear Signal Processing TAEHWAN KIM Center for Advanced Aviation System Development, The MITRE Corporation, M/S N670, 7515 Colshire Drive, McLean, Virginia 22102, USA; Information Technology Laboratory, Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland 21250, USA T ¨ ULAY ADALI Information Technology Laboratory, Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland 21250, USA Received July 2, 2001; Revised November 14, 2001; Accepted November 19, 2001 Abstract. Designing a neural network (NN) to process complex-valued signals is a challenging task since a complex nonlinear activation function (AF) cannot be both analytic and bounded everywhere in the complex plane C. To avoid this difficulty, ‘splitting’, i.e., using a pair of real sigmoidal functions for the real and imaginary components has been the traditional approach. However, this ‘ad hoc’ compromise to avoid the unbounded nature of nonlinear complex functions results in a nowhere analytic AF that performs the error back-propagation (BP) using the split derivatives of the real and imaginary components instead of relying on well-defined fully complex derivatives. In this paper, a fully complex multi-layer perceptron (MLP) structure that yields a simplified complex- valued back-propagation (BP) algorithm is presented. The simplified BP verifies that the fully complex BP weight update formula is the complex conjugate form of real BP formula and the split complex BP is a special case of the fully complex BP. This generalization is possible by employing elementary transcendental functions (ETFs) that are almost everywhere (a.e.) bounded and analytic in C. The properties of fully complex MLP are investigated and the advantage of ETFs over split complex AF is shown in numerical examples where nonlinear magnitude and phase distortions of non-constant modulus modulated signals are successfully restored. Keywords: nonlinear adaptive signal processing, fully complex neural network, split complex neural network, elementary transcendental functions, bounded almost everywhere, analytic almost everywhere 1. Introduction The main reason for the difficulty in finding a nonlin- ear complex activation function in MLP design is the conflict between the boundedness and the differentia- bility of complex functions in the whole complex plane C. As stated by Liouvilles theorem, a bounded entire function must be a constant in C, where an entire func- tion is defined as analytic, i.e., differentiable at every point z C [1]. Therefore, we cannot find an analytic complex nonlinear activation function that is bounded everywhere on the entire C. In general, the fact that many real-valued math- ematical objects are best understood when they are considered as embedded in the complex plane (e.g., the Fourier transform) provides the motivation to de- velop a neural network (NN) structure that uses well- defined fully complex nonlinear activation functions [2]. However, due to the Liouville’s theorem, the com- plex NN has been subjected to the common view that it

Transcript of Fully Complex Multi-Layer Perceptron Network for Nonlinear Signal Processing

Journal of VLSI Signal Processing 32, 29–43, 2002c© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Fully Complex Multi-Layer Perceptron Networkfor Nonlinear Signal Processing

TAEHWAN KIMCenter for Advanced Aviation System Development, The MITRE Corporation, M/S N670, 7515 Colshire Drive,

McLean, Virginia 22102, USA; Information Technology Laboratory, Department of Computer Science andElectrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland 21250, USA

TULAY ADALIInformation Technology Laboratory, Department of Computer Science and Electrical Engineering,

University of Maryland Baltimore County, Baltimore, Maryland 21250, USA

Received July 2, 2001; Revised November 14, 2001; Accepted November 19, 2001

Abstract. Designing a neural network (NN) to process complex-valued signals is a challenging task since acomplex nonlinear activation function (AF) cannot be both analytic and bounded everywhere in the complex planeC. To avoid this difficulty, ‘splitting’, i.e., using a pair of real sigmoidal functions for the real and imaginarycomponents has been the traditional approach. However, this ‘ad hoc’ compromise to avoid the unbounded natureof nonlinear complex functions results in a nowhere analytic AF that performs the error back-propagation (BP)using the split derivatives of the real and imaginary components instead of relying on well-defined fully complexderivatives. In this paper, a fully complex multi-layer perceptron (MLP) structure that yields a simplified complex-valued back-propagation (BP) algorithm is presented. The simplified BP verifies that the fully complex BP weightupdate formula is the complex conjugate form of real BP formula and the split complex BP is a special case ofthe fully complex BP. This generalization is possible by employing elementary transcendental functions (ETFs)that are almost everywhere (a.e.) bounded and analytic in C. The properties of fully complex MLP are investigatedand the advantage of ETFs over split complex AF is shown in numerical examples where nonlinear magnitude andphase distortions of non-constant modulus modulated signals are successfully restored.

Keywords: nonlinear adaptive signal processing, fully complex neural network, split complex neural network,elementary transcendental functions, bounded almost everywhere, analytic almost everywhere

1. Introduction

The main reason for the difficulty in finding a nonlin-ear complex activation function in MLP design is theconflict between the boundedness and the differentia-bility of complex functions in the whole complex planeC. As stated by Liouvilles theorem, a bounded entirefunction must be a constant in C, where an entire func-tion is defined as analytic, i.e., differentiable at everypoint z ∈ C [1]. Therefore, we cannot find an analytic

complex nonlinear activation function that is boundedeverywhere on the entire C.

In general, the fact that many real-valued math-ematical objects are best understood when they areconsidered as embedded in the complex plane (e.g.,the Fourier transform) provides the motivation to de-velop a neural network (NN) structure that uses well-defined fully complex nonlinear activation functions[2]. However, due to the Liouville’s theorem, the com-plex NN has been subjected to the common view that it

30 Kim and Adalı

has to live with a compromised employment of the non-analytic but bounded nonlinear AF for the stability ofthe system [3, 4]. We employ a subset of ETFs derivablefrom the function f (z) = ez that is an entire functionsince f (z) = f ′(z) = ez . These ETFs are entire butbounded almost everywhere, i.e., are only unboundedon a set of points having zero measure. Since thesefunctions are entire, they are also conformal and pro-vide well-defined derivatives while still allowing thecomplex MLP to converge with probability 1, whichis sufficient in practical signal processing applications.Also, among these ETFs, circular and hyperbolic ETFssuch as tan z and tanh z are bilinear (i.e., Mobius)transforms and lend themselves to fixed-point analysisas discussed in [5]. Note that, however, inverse ETFs

(a) (b)

Figure 1. (a) Magnitude of f (z) = z/(c + |z|/r ). (b) Phase of f (z) = z/(c + |z|/r ).

(a) (b)

Figure 2. (a) Magnitude of f (s exp[iβ]) = tanh(s/m) exp[iβ]. (b) Phase of f (s exp[iβ]) = tanh(s/m) exp[iβ].

such as arctanh z are not bilinear transforms, as wediscuss in Section 4.

To overcome the unbounded nature of analytic func-tions in C, earlier fully complex activation functions,proposed by Georgiou and Koutsougeras [3] (given inEq. (1)) and Hirose [6] (given in Eq. (2)) have normal-ized or scaled the amplitude of complex signals:

f (z) = z/(c + |z|/r ) (1)

f (s exp[iβ]) = tanh(s/m) exp[iβ] (2)

where, c, r , and m are real positive constants and z =s exp[iβ]. Figure 1(a) and (b) show the magnitude andphase response of Eq. (1) while Fig. 2(a) and (b) showthose of Eq. (2).

Fully Complex Multi-Layer Perceptron Network 31

However, these functions preserve the phase, thusare less efficient in learning the nonlinear phase vari-ations between the input and the targets than thosecomplex activation functions with nonlinear phasemapping capability. A subset of ETFs introduced inSection 4 possesses amplitude and phase nonlinearmapping capability and are bounded and analytica.e. in a bounded domain of C. The numerical ex-amples included in Section 5 show that these ETFsperform better in restoring nonlinear amplitude andphase distortion in a noisy environment. More specif-ically, the MLP structure using the AF of Eq. (1)performs poorly in restoring nonlinear distortion ofnonconstant modulus signals due to its inflexible ra-dial mapping characteristics as shown in the numer-ical examples included in Section 5. Hirose’s func-tion given in Eq. (2) performs relatively better thanEq. (1), but still worse than most of the ETFs weemploy.

More traditional approach for processing complexsignals with MLPs is to use a ‘split’ complex AF wherea pair of conventional real-valued AFs marginally pro-cess the in-phase (I) and quadrature (Q) components[4, 7–13]. While this approach can avoid the unbound-edness of fully complex AFs in view of Liouville’stheorem, the split complex AF cannot be analytic. Thesplit approach typically employs a pair of real-valuedfunctions fR(x) = f I (x) = tanh x, x ∈ R, as a nonlin-ear complex AF in the hidden layer [7–13], as shownin Eq. (3).

f (z) = fR(Re(z)) + if I (Im(z)) (3)

(a) (b)

Figure 3. (a) Magnitude of split tanh x . (b) Phase of split tanh x .

where z = X T W, X ∈ Cn is the n-dimensional com-

plex input, and W ∈ Cn is the weight vector. This

approach has been shown to be adequate in provid-ing equalization of nonlinear satellite communicationchannels. But as is shown in the numerical examplesin Section 5, fully complex ETFs we introduce out-perform the split complex AF in bit error rate per-formance in our equalization experiments. The well-defined first order derivatives of complex ETFs alsosatisfy the Cauchy-Riemann equations, therefore al-lows for simplification of the fully complex BP that isgiven in [3]. The simplification verifies that the weightupdate formula of fully complex BP is a complex conju-gate form of the real weight update formula. Also, whilethe complex ETFs provide well-defined derivatives foroptimization of the fully complex BP algorithm, thederivatives in split complex BP cannot fully exploit thecorrelation between the real and imaginary componentsof weighted sum of input vectors. Figure 3(a) and (b)show the magnitude and phase of split tanh x function,respectively.

In Section 2 of the paper, the desired propertiesof fully complex AFs stated as necessary in [3] and[4] are reduced and relaxed into a single condition.In Section 3, a more compact version of the fullycomplex BP algorithm than the one derived in [3]is given followed by the comparison between fullyand split complex BP algorithms. It is shown that thesplit complex BP is a special case of the fully com-plex BP algorithm that we derive. In Section 4, nineETFs that satisfy the modified desired properties of afully complex activation function are presented with

32 Kim and Adalı

a and discussion of their characteristics. The perfor-mances of the BP algorithm using nine ETFs alongwith earlier AFs shown in Eqs. (1) and (2) are comparedwith the split complex and complex least mean square(CLMS) equalizers in numerical examples in Section 5.Here, the Volterra series nonlinear satellite TravelingWave Tube Amplifier (TWTA) model [12] exhibit-ing amplitude-to-amplitude (AM/AM) and amplitude-to-phase (AM/PM) distortion is used as the nonlinearchannel model. Non-constant modulus signals are dis-torted in mild and severe nonlinear conditions basedon two levels of amplifier saturation. For better eval-uation of different MLPs, multiple levels of train-ing stopping criteria are used to compare the closest-to-the-best performances of each training scheme andthe learning behavior.

2. Desired Properties of a Fully ComplexActivation Function

The unbounded nature of nonlinear analytic functionsin C triggered the definition of “desirable” propertiesfor the complex AFs. The first attempt was made byGeorgiou and Koutsougeras [3] who identified five de-sirable properties of a fully complex AF f (z):

1. f (z) = u(x, y) + iv(x, y) is nonlinear in x and y2. f (z) is bounded3. The partial derivatives ux , uy, vx , and vy exist and

are bounded4. f (z) is not entire5. uxvy �= vx uy .

Note that by Liouville’s theorem, the second and thefourth conditions are redundant, i.e., a bounded non-linear function in C cannot be entire. Using this fact,You and Hong [4] reduced the above conditions to fourconditions given below and introduced a split complexAF for the Quadrature Amplitude Modulation (QAM)signal equalization problem they considered:

1. f (z) = u(x, y) + iv(x, y) is nonlinear in x and y2. For the stability of a system, f (z) should have no

singularities and be bounded for all z in a boundedset

3. The partial derivatives ux , uy, vx , and vy should becontinuous and bounded

4. uxvy �= vx uy . If not, then f (z) is not a suitable acti-vation function except in the following cases:

(i) ux = vx = 0, and uy �= 0, vy �= 0,

(ii) uy = vy = 0, and ux �= 0, vx �= 0.

Note that both sets of conditions above are empha-sizing boundedness of an AF and its partial derivatives,even when the function is defined in a local domain ofinterest. The ETFs that we propose violate the secondand third boundedness requirements of both sets of con-ditions above. But these violations only occur on a setof points with zero-measures, i.e., they are bounded andanalytic. a.e. This property results in convergence withprobability 1 that has no practical significance in en-gineering applications. More importantly, these func-tions outperform other linear complex schemes andAFs given in Eqs. (1) and (2) in learning nonlineardistortion once the domain is bounded as boundednessnaturally sets the limit on the magnitude of range (seethe discussions on the characteristics of these elemen-tary transcendental functions in Section 4).

The last restriction uxvy �= vx uy was originally im-posed to guarantee continuous learning, i.e., to pre-vent a situation where a non-zero activation functioninput forces the gradient of the error function (with re-spect to the complex weight) to become zero. We alsonote that this condition is unnecessary for fully com-plex ETFs because they satisfy the Cauchy-Riemannequations shown in Eq. (4).

For non-continuous functions, Cauchy-Riemannequations are the necessary condition for a complexfunction to be analytic at a point z ∈ C. They canbe written by noting that the partial derivatives off (z) = u(x, y) + iv(x, y), where z = x + iy, alongthe real and imaginary axes should be equal:

f ′(z) = ux + ivx = vy − iuy . (4)

Equating the real and imaginary parts above, we obtainthe Cauchy-Riemann equations: ux = vy, vx = −uy .Once Cauchy-Riemann equations are met, the lastrestriction uxvy �= vx uy can be replaced by uxvy �=vx uy ⇒ u2

x �= −u2y , which is always true except when

ux = uy = 0. This result also leads to vx = vy = 0,since all four partial derivatives are real. This only hap-pens when a local extremum occurs where the learningprocess should naturally stop. Therefore, the last con-dition is only necessary for non-analytic split complexfunctions but not for analytic fully complex functions.

Consequently, since there is no interest in using alinear function as an activation function and the par-tial derivatives are well defined to satisfy the Cauchy-Riemann equations, the desirable properties of a fullycomplex activation function is reduced and relaxed asfollows:

Fully Complex Multi-Layer Perceptron Network 33

In a bounded domain of complex plane C, a fully com-plex nonlinear activation function f (z) needs to beanalytic and bounded a.e.

3. Complex Back-Propagation

Assuming that the fully complex AF satisfies the con-dition above such that it is analytic a.e., the Cauchy-Riemann equations can be used to simplify the fullycomplex BP algorithm derived in [3] as shown in thissection. The simplification uses the following conciseexpression obtained from Eq. (4):

f ′(z) = fx = −i f y (5)

where fx and fy are partial derivates with respect to xand y respectively.

If the fully complex activation function is f (z) =u(x, y) + iv(x, y), then the sum-squared error at theoutput layer can be written as

E = 1

2

∑n

|en|2, where en = dn − on (6)

on ≡ f (zn) ≡ un + ivn, zn ≡ xn + iyn ≡∑

k

Wnk Xnk

=∑

k

(W R

nk + iW Ink

)(X R

nk + i X Ink

)(7)

where dn is the n-th desired symbol and on is the outputof the n-th output neuron and the subscripts R and I in-dicate the real and imaginary components, respectively.The backpropagation weight adaptation rule requiresthe computation of the gradient ∂ E/∂Wnk . Since thecost function is a real function of a complex variable,the gradient of the error function with respect to thereal and imaginary components of Wnk can be writtenas

∂ E

∂Wnk≡ ∇Wnk E = ∂ E

∂W Rnk

+ i∂ E

∂W Ink

, (8)

and using the chain rule,

∂ E

∂W Rnk

= ∂ E

∂un

(∂un

∂xn

∂xn

∂W Rnk

+ ∂un

∂yn

∂yn

∂W Rnk

)

+ ∂ E

∂vn

(∂vn

∂xn

∂xn

∂W Rnk

+ ∂vn

∂yn

∂yn

∂W Rnk

)(9)

∂ E

∂W Ink

= ∂ E

∂un

(∂un

∂xn

∂xn

∂W Ink

+ ∂un

∂yn

∂yn

∂W Ink

)

+ ∂ E

∂vn

(∂vn

∂xn

∂xn

∂W Ink

+ ∂vn

∂yn

∂yn

∂W Ink

). (10)

Defining δRn ≡ −∂ E/∂un and δ I

n ≡ −∂ E/∂vn , andusing the following partial derivatives identifiable fromEq. (7),

∂xn

∂W Rnk

= X Rnk ;

∂yn

∂W Rnk

= X Ink ;

∂xn

∂W Ink

= −X Ink ;

∂yn

∂W Ink

= X Rnk

Eqs. (9) and (10) can be simplified as

∂ E

∂W Rnk

= −δRn

(∂un

∂xnX R

nk + ∂un

∂ynX I

nk

)

− δ In

(∂vn

∂xnX R

nk + ∂vn

∂ynX I

nk

)(11)

∂ E

∂W Ink

= −δRn

(∂un

∂xn

(−X Ink

) + ∂un

∂ynX R

nk

)

− δ In

(∂vn

∂xn

(−X Ink

) + ∂vn

∂ynX R

nk

). (12)

Combining Eqs. (11) and (12), the gradient of the errorfunction can be simplified as

∇Wnk E = −Xnk

{(∂un

∂xn+ i

∂un

∂yn

)δR

n

+(

∂vn

∂xn+ i

∂vn

∂yn

)δ I

n

}. (13)

Note that Eq. (13) was first established in [3], but fur-ther applying the Cauchy-Riemann equations as shownin Eq. (14), the compact form of fully complex BPalgorithm is obtained in Eq. (15).

∂un

∂xn+ i

∂un

∂yn= ∂un

∂xn− i

∂vn

∂xn= ∂ f n

∂xn= f ′(zn)

∂vn

∂xn+ i

∂vn

∂yn= −∂un

∂yn+ i

∂vn

∂yn

= −(

∂un

∂yn− i

∂vn

∂yn

)= −∂ f n

∂yn= i f ′(zn)

(14)

∇Wnk E = −Xnk(

f ′(zn)δRn + i f ′(zn)δ I

n

)= −Xnk f ′(zn)δn (15)

It is worth noting that except the conjugate on Xnk andf ′(zn) terms, Eq. (15) is identical to the gradient of

34 Kim and Adalı

the error function in the real-valued backpropagationalgorithm.

Complex weight update Wnk is proportional to thenegative gradient:

Wnk = α Xnk f ′(zn)δn (16)

where, α can be either a real, imaginary, or complexlearning rate. The performance differences with eachtype of learning rate and the choice for a given problemis an interesting problem that needs to be investigated.When the complex weight belongs to an output neuron:δn = en = dn − on and when Wmk belongs to the m-thhidden layer, the net input zm to neuron m is the sameas Eq. (7):

zm = xm + iym =∑

k

(uk + ivk)(W R

mk + iW Imk

)(17)

where the index k belongs to every neuron that feedsinto the neuron m. Using the chain rule,

δRm = − ∂ E

∂um

= −∑

k

∂ E

∂uk

(∂uk

∂xk

∂xk

∂um+ ∂uk

∂yk

∂yk

∂um

)

−∑

k

∂ E

∂vk

(∂vk

∂xk

∂xk

∂um+ ∂vk

∂yk

∂yk

∂um

)

=∑

k

δRk

(∂uk

∂xkW R

mk + ∂uk

∂ykW I

mk

)

+∑

k

δ Ik

(∂vk

∂xkW R

mk + ∂vk

∂ykW I

mk

). (18)

Similarly,

δ Im = − ∂ E

∂vm

= −∑

k

∂ E

∂uk

(∂uk

∂xk

∂xk

∂vm+ ∂uk

∂yk

∂yk

∂vm

)

−∑

k

∂ E

∂vk

(∂vk

∂xk

∂xk

∂vm+ ∂vk

∂yk

∂yk

∂vm

)

=∑

k

δRk

(∂uk

∂xk

(−W Imk

) + ∂uk

∂ykW R

mk

)

+∑

k

δ Ik

(∂vk

∂xk

(−W Imk

) + ∂vk

∂ykW R

mk

). (19)

Using the same partial derivatives that helped to estab-lish Eqs. (11) and (12), and combining Eqs. (18) and(19), the following expression is obtained similarly forthe weight update function:

δm = δRm + iδ I

m =∑

k

Wmk f ′(zm) δk (20)

Compared to the fully complex activation representa-tion as f (z) = u(x, y) + iv(x, y), the split complexactivation function is a special case and can be rep-resented as f (z) = u(x) + iv(y). This indicates thatuy = vx = 0 for the split complex backpropagationalgorithm. Removing these zero terms from Eq. (13),we obtain the following (complex) weight updates:

∇Wnk = α Xnk

(∂un

∂xnδR

n + i∂vn

∂ynδ I

n

). (21)

As before, for output layer neuron, δn = en = dn − on .For the hidden layer,

δm =∑

k

Wmk

(∂uk

∂xmδR

k + i∂vk

∂ymδ I

k

). (22)

Even though complex multiplication and additionare used throughout the weight update process inEqs. (21) and (22), the marginal nature of real-to-real and imaginary-to-imaginary interactions that limitthe full use of information from the real and imagi-nary components of the signal is clearly observed inthese equations. Note that the split complex weight up-date form shown in Eq. (22) can be also simplifiedto Eq. (20) which was first described by Leung andHaykin in [7].

4. Elementary Transcendental Functions

The following elementary transcendental functions in-cluding tanh z are identified to provide adequate non-linear discriminant as an activation function and theysatisfy the generalized condition we give in Section 2.

• Circular functions

tan z = eiz − e−i z

i(eiz + e−i z),

{d

dztan z = sec2 z

},

sin z = eiz − e−i z

2i,

{d

dzsin z = cos z

}

Fully Complex Multi-Layer Perceptron Network 35

• Inverse circular functions

arctan z =∫ z

0

dt

1 + t2,

{d

dzarctan z = 1

1 + z2

},

arcsin z =∫ z

0

dt

(1 − t)1/2,

{d

dzarcsin z = (1 − z2)−1/2

},

arccos z =∫ 1

z

dt

(1 − t2)1/2,

{d

dzarccos z = −(1 − z2)−1/2

}

• Hyperbolic functions

tanh z = sinh z

cosh z= ez − e−z

ez + e−z,

{d

dztanh z = sec h2z

},

sinh z = ez − e−z

2,

{d

dzsinh z = cosh z

}

• Inverse hyperbolic functions

arctanh z =∫ z

0

dt

1 − t2,

{d

dzarctanh z = (1 − z2)−1

},

arcsinh z =∫ z

0

dt

(1 + t2)1/2,

{d

dzarcsinh z = (1 + z2)−1

}

We employ these activation functions in an MLPstructure that uses a tap delayed input line proportional

(a) (b)

Figure 4. (a) Magnitude of sin z. (b) Phase of sin z.

to the expected correlation span in the observations(channel outputs). Figures 4 through 12 show the mag-nitude and phase of these elementary transcendentalfunctions.

Note the periodicity and singularity of tanh z at every(1/2 + n)π i, n ∈ N, in Fig. 6, while Figs. 5, 7, and 11show singularities of tan z, arctan z, and arctanh z at(1/2 + n)π, ±i and ±1, respectively. Usually, thesesingular points and discontinuities at non-zero pointsdo not pose a problem in training when the domainof interest is bounded within a circle of radius π/2.If the domain is larger including these irregular points,the training process tends to become more sensitive tothe size of the learning rate and poses a penalty when afixed learning rate is used. Also, it is observed that theinitial random weights need to be bounded in a smallradius, typically below 0.1 to avoid oscillation in thegradient based updates.

Unlike tan z and tanh z functions with pointwise sin-gularity, Figs. 8 and 12 show that arcsin z and arcsinh zfunctions are not continuous and therefore not analyticalong a portion of the real or the imaginary axis. How-ever, it is most radially symmetric in magnitude, whileFig. 9 shows that arccos z function is asymmetric alongreal axis and has a discontinuous zero point at z = 1.The split-tanh x function in Fig. 3 shows that it is notquite radially symmetric because of the valleys alongthe real and imaginary axes. Note that, these functionshave decreasing rates of amplitude growth as we moveaway from the origin. Because of the decreasing rateof amplitude growth, the application of scaling coef-ficient in the activation function may become neces-sary when the data shows large dynamic range. Also

36 Kim and Adalı

(a) (b)

Figure 5. (a) Magnitude of tan z. (b) Magnitude of tan z.

(a) (b)

Figure 6. (a) Magnitude of tanh z. (b) Phase of tanh z.

(a) (b)

Figure 7. (a) Magnitude of arctan z. (b) Phase of arctan z.

Fully Complex Multi-Layer Perceptron Network 37

(a) (b)

Figure 8. (a) Magnitude of arcsin z. (b) Phase of arcsin z.

(a) (b)

Figure 9. (a) Magnitude of arccos z. (b) Phase of arccos z.

(a) (b)

Figure 10. (a) Magnitude of sinh z. (b) Phase of sinh z.

38 Kim and Adalı

(a) (b)

Figure 11. (a) Magnitude of arctanh z. (b) Phase of arctanh z.

(a) (b)

Figure 12. (a) Magnitude of arcsinh z. (b) Phase of arcsinh z.

note that, unlike arcsinh z and arccos z functions thatare unbounded, split-tanh x function is bounded by√

2 in magnitude from the squared sum of real andimaginary tanh x components. Instead, when the do-main is bounded, arcsinh z and arccos z are naturallybounded. As will be shown in numerical examples inSection 4, the radial symmetricity of arcsinh z func-tion yields the best symbol-error-rate (SER) test resultin restoring mild nonlinear distortion of non-constantmodulus signals. For severe nonlinearity, arctan z andarctanh z tend to show better SER performance wherethe saddle-point-like magnitude response and smoothnonlinear phase response seem to provide betterperformance.

5. Numerical Examples

The following discrete-input and discrete-output rela-tionship describes the TWTA AM/AM and AM/PMmodels using Volterra series [12]:

xn =∞∑

k=1

∑n1

· · ·∑n2k−1

an−n1 an−n2 · · · an−nk a∗n−nk+1

· · · a∗n−n2k−1

H (2k−1)n1,...n2k−1

+ νo (23)

where νo is a complex Gaussian random noise rep-resenting the down-link noise, and an represents theinformation symbols, and H (1)

n , H (3)n1,n2,n3

· · · are a set

Fully Complex Multi-Layer Perceptron Network 39

of complex Volterra series coefficients that describe theeffect of the nonlinear channel on symbols an as follows

Linear coefficient: H (1)0 = 1.22 + i0.646,

H (1)1 = 0.063 − i0.001,

H (1)2 = −0.024 − i0.014,

H (1)3 = 0.036 + i0.031

3rd order coefficient: H (3)002 = 0.039 − i0.022,

H (3)330 = 0.018 − i0.018,

H (3)001 = 0.035 − i0.035,

H (3)003 = −0.04 − i0.009,

H (3)110 = −0.01 − i0.017

5th order coefficient: H (5)00011 = 0.039 − i0.022.

Figure 13 shows the time-delayed MLP structureused for the equalization of non-constant modulusQAM signals transmitted through the channel modelgiven in Eq. (23) to test the performance of the fullycomplex MLPs we introduce. The MLP equalizer usedin this section has a 10-5-1 structure, i.e., a 10 tap de-layed input layer fully connected to 5 neurons in thehidden layer that is connected to the single output ofthe equalizer. The order of the CLMS equalizer testedis 50. The sizes of equalizers are chosen so that eachstructure tested has similar complexity with satisfac-tory representation capability. A real valued learningrate is used to optimize the training in the results wepresent here. Tests are also carried out with imaginaryand complex learning rates and it is noted that theyresult in some but not very significant differences intraining performance. However, this is a problem thatneeds further study.

Figure 13. Nonlinear channel equalizer structure.

Figure 14. Input constellation plus noise.

Figure 15. Received constellation with mild channel distortion.

Figure 14 shows a 16-QAM constellation with 8 dBsignal-to-noise ratio per bit (Eb/No) that reaches thesaturation level of unit amplitude at the outermost cor-ner points. Figure 15 shows the AM/AM and AM/PMimpact on 16-QAM symbols transmitted throughTWTA where the amplitude amplification and phaserotation effects are shown. In this case, due to the mod-erate saturation level, the distortion is almost linear.

Figures 16 and 17 show signal-to-noise per bit(SNR/bit or Eb/No) versus SER equalization test per-formances of the 13 schemes discussed in this paperafter each scheme is trained to restore the mild nonlin-ear TWTA distortion shown in Fig. 15. They includethe fully complex activation functions of Georgiou andKoutsougeras (GK) [3] and Hirose [5], the split com-plex hyperbolic tangent, and the CLMS, along withthe nine transcendental functions that were introducedin Section 4. To compare the best SER performancesof all 13 schemes, we train and test all schemes atmean square error (MSE) stopping levels of 10−5, 10−6,and 10−7 at 12 dB Eb/No and pick the stopping levelthat yields the best overall test performance for each

40 Kim and Adalı

Figure 16. SER performance for the mild nonlinear distortion case(First group).

Figure 17. SER performace for the mild nonlinear distortion case(Second group).

tested scheme. Note that for most schemes, 10−7 MSElevel provided the best performance that we had de-termined as the lowest threshold to be tested. In casesthe schemes did not achieve the lowest MSE level of10−7, we stopped the training after 20,000 iterations.Figures 16 and 17 show these best SER performances.

Here, most schemes except that of Georgiou andKoutsougeras converged close to the minimum SERlevel of a 16-QAM scheme i.e., the performancewith additive white gaussian noise (AWGN) only.Here, it is assumed that the noise amplitude andphase are Gaussian and uniform distributions and areindependent of each other. It is not surprising thatCLMS that fully utilizes the information from the realand imaginary positions performed well in this almost

Figure 18. Equalization by arcsinh MLP (mild case).

linear distortion case. Two of the nine fully complex ac-tivation functions, arctanh z and asinh z, however, out-performed CLMS. Note that the split tanh x (STANH)did not perform well as shown in the equalization resultof Fig. 19, where the overall shape of restored constella-tion is not as clearly square as the restored constellationof arcsinh z as shown in Fig. 18.

The second equalization example is designed to testthe performance of fully complex MLPs under higherlevel of nonlinear distortion as illustrated in Fig. 20. Theextreme level of AM/PM distortion is clearly visiblehere as the level of saturation is much higher for theouter constellation points. Figure 20 shows the outcomeof equalization using arcsinh z activation function MLPwhere the equalization could not fully restore the severeAM/PM distortion from high level of saturation.

Figure 22 shows the best performance of 13 schemesin highly nonlinear saturated channel equalization casedescribed in Figs. 20 and 21 above. In this case, the rel-ative poor performance of CLMS along with Georgiouand Koutsougeras (GK) scheme [3] and Hirose scheme[5] are clearly represented. This is explainable in the

Figure 19. Equalization by STANH MLP (mild case).

Fully Complex Multi-Layer Perceptron Network 41

Figure 20. Received 16 QAM constelation (severe nonlinear case).

Figure 21. Equalization by arcsinh MLP (severe nonlinear case).

poor phase learning capability of these activation func-tions that could not restore the severe phase distortion.It is also noteworthy that unlike to the mild nonlin-ear case, split tanh (STANH) function outperformedCLMS as it provides nonlinear correction but four fullyactivation functions including tanh z, arctan z, arctanh zand arcsin z functions still outperformed the STANHfunction. The change of top performers between themild and severe nonlinear distortions except arctanh z

Figure 22. Eb/No vs. SER performance of severe nonlineardistortion case.

raise an intriguing question about the characteristics ofactivation functions under different circumstances.

6. Conclusion

Nine elementary transcendental functions are identifiedas fully complex nonlinear activation functions capableof learning nonlinear amplitude and phase distortionsimultaneously. The fully complex MLP employingthese activation functions are shown to outperform thetraditional complex MLP and complex LMS schemesin satellite nonlinear channel equalization simulations.The common characteristics of elementary transcen-dental functions are analyzed and categorized. Mostimportantly, it is noted that these functions are boundedand analytic almost everywhere in a bounded domain ofinterest. This fact establishes the practicality of a fullycomplex MLP for its almost everywhere convergenceproperty that is sufficient in all practical engineeringapplications.

While studying the elementary transcendental func-tions as the nonlinear activation function, a compactform of fully complex back-propagation algorithm isestablished. The Cauchy-Riemann equations satisfiedby the almost everywhere analyticity of fully complexactivation functions enable the simplification of thepreviously published complex back-propagation algo-rithm. It is shown that the back-propagation complexweight update equation is a complex conjugate ver-sion of the well-established real weight update equa-tion. Furthermore, it is shown that the split complex

42 Kim and Adalı

back-propagation algorithm is a special case of the fullycomplex algorithm we define. Consequently, the exist-ing set of conditions considered to be ‘desirable’ forcomplex nonlinear activation functions from the splitcomplex perspective [3, 4] is relaxed and reduced intoa single condition.

In numerical examples, it is demonstrated that theconventional linear approach, i.e., the CLMS algorithmprovides quite satisfactory performance in the case ofmild nonlinear channel distortion, but performs poorlyin strong nonlinear distortion. The split tanh x acti-vation function outperforms CLMS in the strong non-linear distortion case, but is outperformed by severalfully complex elementary transcendental functions inboth mild and strong nonlinear distortion cases. In ev-ery case, the singularity and discontinuity of elemen-tary transcendental functions around the unit circle didnot pose difficulty in training the fully complex MLPby backpropagation.

References

1. H. Silverman, Complex Variables, Houghton, Newark, USA,1975.

2. T. Clarke, “Generalization of Neural Network to the ComplexPlane,” in Proc. of IJCNN, vol. 2, 1990, pp. 435–440.

3. G. Georgiou and C. Koutsougeras, “Complex Backpropagation,”IEEE Trans. on Circuits and Systems II, vol. 39, no. 5, 1992,pp. 330–334.

4. C. You and D. Hong, “Nonlinear Blind Equalization SchemesUsing Complex-Valued Multilayer Feedforward Neural Net-works,” IEEE Trans. on Neural Networks, vol. 9, no. 6, 1998,pp. 1442–1455.

5. D. Mandic and J. Chambers, Recurrent Neural Networks forPrediction, John Wiley and Sons, 2001.

6. A. Hirose, “Continuous Complex-Valued Back-PropagationLearning,” Electronics Letters, vol. 28, no. 20, 1992, pp. 1854–1855.

7. H. Leung and S. Haykin, “The Complex BackpropagationAlgorithm,” IEEE Trans. on Signal Proc., vol. 3, no. 9, 1991,pp. 2101–2104.

8. N. Benvenuto, M. Marchesi, F. Piazza, and A. Uncini, “NonLinear Satellite Radio Links Equalized Using Blind NeuralNetworks,” in Proc. of ICASSP, vol. 3, 1991, pp. 1521–1524.

9. N. Benvenuto and F. Piazza, “On the Complex BackpropagationAlgorithm,” IEEE Trans. on Signal Processing, vol. 40, no. 4,1992, pp. 967–969.

10. M. Ibnkahla and F. Castanie, “Vector Neural Networks forDigital Satellite Communications,” in Proc. of ICC, vol. 3, 1995,pp. 1865–1869.

11. A. Uncini, L. Vecci, P. Campolucci, and F. Piazza, “Complex-Valued Neural Networks with Adaptive Spline Activation Func-tions,” IEEE Trans. on Signal Processing, vol. 47, no. 2, 1999.

12. S. Bandito and E. Biglieri, “Nonlinear Equalization of DigitalSatellite Channels,” IEEE Jour. on SAC., vol. SAC-1., 1983,pp. 57–62.

13. G. Kechriotis and E. Manolakos, “Training Fully RecurrentNeural Networks with Complex Weights,” IEEE Trans. on Cir-cuits and Systems—II: Analog and Digital Signal Processing,vol. 41, no. 3, 1994, pp. 235–238.

14. J. Deng, N. Sundararajan, and P. Saratchandran, “Communi-cation Channel Equalization Using Complex-Valued MinimalRadial Basis Functions Neural Network,” in Proc. of IEEEIJCNN 2000, vol. 5, 2000, pp. 372–377.

15. K.Y. Lee and S. Jung, “Extended Complex RBF and itsApplication to M-QAM in Presence of Co-Channel Interfer-ence,” Electronics Letters, vol. 35, no. 1, 1999, pp. 17–19.

16. S. Chen, P.M. Grant, S. McLaughlin, and B. Mulgrew,“Complex-Valued Radial Basis Function Networks,” in Proc.of third IEEE International Conference on Artificial NeuralNetworks,” 1993, pp. 148–152.

17. T. Kim and T. Adalı, “Fully Complex Backpropagation forConstant Envelop Signal Processing,” in Proc. of IEEE Work-shop on Neural Networks for Sig. Proc., Sydney, Dec. 2000,pp. 231–240.

18. T. Kim and T. Adalı, “Complex Backpropagation NeuralNetwork Using Elementary Transcendental Activation Func-tions,” in Proc. of IEEE ICASSP, Proc. vol. II, Salt Lake City,May 2001.

19. T. Kim and T. Adalı, “Nonlinear Satellite Channel EqualizationUsing Fully Complex Feed-Forward Neural Networks,” in Proc.of IEEE Workshop on Nonlinear Signal and Image Processing,Baltimore, June, 2001, pp. 141–150.

Taehwan Kim received his B.S. degree in mathematics from SeoulNational University, Korea in 1984 and M.S. degree in computerscience from the University of South Carolina, Columbia, U.S. in1986. Between 1990 and 1994, he studied applied mathematics andelectrical engineering as a part time graduate student at the Univer-sity of Maryland College Park. Currently, he is a Ph.D. candidatein electrical engineering at the University of Maryland BaltimoreCounty. From 1987 to 1997, he worked with Glynn Scientific, Inc.,Stanford Telecommunications, Inc., and Hughes Information Sys-tems, all in Maryland in the areas of target recognition and satellitecommunications. Since 1997, he has been working with the MITRECorporation in McLean, Virginia where his main tasks have been themodernization of civil Global Positioning Satellite (GPS) and itsaugmentation systems for the U.S. Federal Aviation Administra-tion (FAA). He has coauthored several papers for the Institute ofNavigation (ION) conferences in the areas of robust GPS navi-gation, ionospheric scintillation, and interference assessment andmitigation. Since 2000, he has been presenting the mathematicalproperty of fully complex neural networks in the IEEE Interna-tional Workshop on Neural Network for Signal Processing (NNSP)and International Conference on Acoustics, Speech, and Signal

Fully Complex Multi-Layer Perceptron Network 43

Processing (ICASSP). His research interests include adaptive signalprocessing, wireless communication, radio navigation, and [email protected]

Tulay Adalı received the B.S. degree from Middle East TechnicalUniversity, Ankara, Turkey, in 1987 and the M.S. and Ph.D. degrees

from North Carolina State University, Raleigh, in 1988 and 1992respectively, all in electrical engineering. In 1992, she joined theDepartment of Electrical Engineering at the University of MarylandBaltimore County, Baltimore, where she currently is an associateprofessor. She has served on the organizing committees of a num-ber of international conferences including the IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP),and the IEEE International Workshop on Neural Networks for Sig-nal Processing (NNSP). She has been the general co-chair for theNNSP workshops for the last two years. She is the secretary for theIEEE Neural Networks for Signal Processing Technical Commit-tee and is serving on the IEEE Signal Processing conference board.Her research interests are in the areas of adaptive signal process-ing, neural computation, estimation theory, and their applications inchannel equalization, biomedical image analysis, and optical com-munications. Dr. Adalı is the recipient of a 1997 National ScienceFoundation CAREER [email protected]