Newton like minimum entropy equalization

10
1 Newton-Like Minimum Entropy Equalization Algorithm for APSK Systems Anum Ali, Shafayat Abrar, Azzedine Zerguine and Asoke K. Nandi Abstract—In this paper, we design and analyse a Newton-like blind equalization algorithm for APSK system. Specifically, we exploit the principle of minimum entropy deconvolution and derive a blind equal- ization cost function for APSK signals and optimize it using Newton’s method. We study and evaluate the steady-state excess mean square error performance of the proposed algorithm using the concept of energy conservation. Numerical results depict a significant performance enhance- ment for the proposed scheme over well established blind equalization algorithms. Further, the analytical excess mean square error of the proposed algorithm is verified with computer simulations and is found to be in good conformation. Index Terms—constant modulus algorithm, blind equalizer, recursive least squares algorithm, Newton’s method, tracking performance, ampli- tude phase shift keying I. I NTRODUCTION In digital communications at high enough data rates, almost all physical channels exhibit inter-symbol interference (ISI). One of the solutions to this problem is blind equalization, which is a method to equalize distortive communication channels and mitigate ISI without supervision. Blind equalization algorithms do not require training at either the startup period or restart after system breakdown. This independence of blind schemes with respect to training sequence results in improved system bandwidth efficiency. In blind equalization, the desired signal is unknown to the receiver, except for its probabilistic or statistical properties over some known alphabets. As both the channel and its input are unknown, the objective of blind equalization is to recover the unknown input sequence based solely on its probabilistic and statistical properties [1], [2], [3]. From the available literature, it can be found that any admissible blind objective (or cost) function has two main attributes: 1) it makes use of the statistics which are significantly modified as the signal propagates through the channel [4], and 2) optimization of the cost function modifies the statistics of the signal at the equalizer output, aligning them with the statistics of the signal at the channel input [5]. One of the earliest methods of blind equalization was suggested by Benveniste et al. [5]. Their proposed method assumed the transmitted signal to be non-Gaussian, independent and identically distributed (i.i.d.) sequence of a known statistical distribution. It sought to match the distribution of its output (deconvolved sequence) with the distri- bution of the transmitted signal and the adaptation continued until the said objective was achieved. Another approach to this problem was devised by Donoho [4], who defined a partial ordering, measuring the relative Gaussianity between random variables. He suggested to adjust the equalizer until the distribution of the deconvolved sequence is as non-Gaussian as possible. A somewhat informal version of Donoho’s method appeared in the work of Wiggins [6], [7]. According to Wiggins, assuming the transmitted signal is a non- Gaussian signal with certain distribution pa, the equalizer must be S. Abrar is with the Department of Electrical Engineering, COM- SATS Institute of Information Technology, Islamabad, Pakistan. Email: [email protected]. A. Ali and A. Zerguine are with the Department of Electrical Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia. A.K. Nandi is with the Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UK. adjusted to make its output signal distribution resemble pa. He termed this approach as minimum entropy deconvolution (MED) criterion. In this work, the MED criterion is the subject of our concern for designing (and optimizing) cost functions to equalize signals blindly in amplitude phase shift keying (APSK) systems. The APSK signals are very important in modern day communication systems due to their robustness against nonlinear channel distortion and advantageous lower peak-to-average power ratio compared to the conventional quadrature amplitude modulation signals (refer to APSK based systems in [8], [9], [10], [11], [12], [13], [14] and references therein). This paper is organized as follows: Section II discusses the baseband communication system model, notion of Gaussianity, and traditional blind equalizers. Section III describes the MED criteria for channel equalization, discusses the admissibility of costs tailored for APSK, and stochastic gradient-based optimization. Section IV formulates the proposed adaptive MED-based blind equalization scheme for APSK systems exploiting Newton-like update. Section V provides a steady-state tracking performance of the proposed algorithm in time varying scenario. Simulation results are discussed in Section VI and conclusions are provided in Section VII. [h!] II. SYSTEM MODEL AND TRADITIONAL BLIND EQUALIZERS The baseband model for a typical complex-valued data communi- cation system, as shown in Fig. 1, consists of an unknown finite- impulse response filter hn, which represents the physical inter- connection between the transmitter and the receiver. A zero-mean, i.i.d., circularly symmetric, complex-valued data sequence {an} is transmitted through the channel, whose output xn is recorded by the receiver. The input/output relationship of the channel can be written as: xn = k a nk h k + νn, where the additive noise νn is assumed to be stationary, Gaussian, and independent of the channel input an. The function of equalizer at the receiver is to estimate the delayed version of original data, anτ , from the received signal xn. Let wn =[wn,0,wn,1, ··· ,wn,N1] T be a vector of equalizer coeffi- cients with N elements and xn =[xn,xn1 ··· ,xnN+1] T be the vector of channel observations ( T denotes transpose operation). The output of the equalizer is then given by yn = w H n1 xn ( H denotes the Hermitian conjugate operator). If tn = hn w n1 represents the overall channel-equalizer impulse response (denotes convolution), then yn = i w n1,i xni = tn,τ anτ + l=τ t n,l a nl + ν n signal + ISI + noise (1) which demonstrates the adverse effect of inter-symbol interference (ISI) and additive noise. The ISI is quantified as [15]: ISI = i |tn,i | 2 max |tn| 2 max {|tn| 2 } (2) In subsequent discussions we drop the subscript n from t for notational convenience. The idea behind a Bussgang blind equalizer is to minimize (or maximize), through the choice of w, a certain cost

Transcript of Newton like minimum entropy equalization

1

Newton-Like Minimum Entropy Equalization Algorithmfor APSK Systems

Anum Ali, Shafayat Abrar, Azzedine Zerguine and Asoke K. Nandi

Abstract—In this paper, we design and analyse a Newton-like blindequalization algorithm for APSK system. Specifically, we exploit theprinciple of minimum entropy deconvolution and derive a blind equal-ization cost function for APSK signals and optimize it usingNewton’smethod. We study and evaluate the steady-state excess mean squareerror performance of the proposed algorithm using the concept of energyconservation. Numerical results depict a significant performance enhance-ment for the proposed scheme over well established blind equalizationalgorithms. Further, the analytical excess mean square error of theproposed algorithm is verified with computer simulations and is foundto be in good conformation.

Index Terms—constant modulus algorithm, blind equalizer, recursiveleast squares algorithm, Newton’s method, tracking performance, ampli-tude phase shift keying

I. I NTRODUCTION

In digital communications at high enough data rates, almostallphysical channels exhibit inter-symbol interference (ISI). One of thesolutions to this problem is blind equalization, which is a method toequalize distortive communication channels and mitigate ISI withoutsupervision. Blind equalization algorithms do not requiretrainingat either the startup period or restart after system breakdown. Thisindependence of blind schemes with respect to training sequenceresults in improved system bandwidth efficiency.

In blind equalization, the desired signal is unknown to the receiver,except for its probabilistic or statistical properties over some knownalphabets. As both the channel and its input are unknown, theobjective of blind equalization is to recover the unknown inputsequence based solely on its probabilistic and statisticalproperties[1], [2], [3]. From the available literature, it can be foundthat anyadmissible blind objective (or cost) function has two main attributes:1) it makes use of the statistics which are significantly modified asthe signal propagates through the channel [4], and 2) optimization ofthe cost function modifies the statistics of the signal at theequalizeroutput, aligning them with the statistics of the signal at the channelinput [5].

One of the earliest methods of blind equalization was suggested byBenvenisteet al. [5]. Their proposed method assumed the transmittedsignal to be non-Gaussian, independent and identically distributed(i.i.d.) sequence of a known statistical distribution. It sought to matchthe distribution of its output (deconvolved sequence) withthe distri-bution of the transmitted signal and the adaptation continued until thesaid objective was achieved. Another approach to this problem wasdevised by Donoho [4], who defined a partial ordering, measuringthe relative Gaussianitybetween random variables. He suggestedto adjust the equalizer until the distribution of the deconvolvedsequence is asnon-Gaussianas possible. A somewhat informalversion of Donoho’s method appeared in the work of Wiggins [6],[7]. According to Wiggins, assuming the transmitted signalis a non-Gaussian signal with certain distributionpa, the equalizer must be

S. Abrar is with the Department of Electrical Engineering, COM-SATS Institute of Information Technology, Islamabad, Pakistan. Email:[email protected]. A. Ali and A. Zerguine are with the Departmentof Electrical Engineering, King Fahd University of Petroleum and Minerals,Dhahran, Saudi Arabia. A.K. Nandi is with the Department of Electronic andComputer Engineering, Brunel University, Uxbridge, Middlesex, UK.

adjusted to make its output signal distribution resemblepa. He termedthis approach asminimum entropy deconvolution(MED) criterion.

In this work, the MED criterion is the subject of our concernfor designing (and optimizing) cost functions to equalize signalsblindly in amplitude phase shift keying (APSK) systems. TheAPSKsignals are very important in modern day communication systemsdue to their robustness against nonlinear channel distortion andadvantageous lower peak-to-average power ratio compared to theconventional quadrature amplitude modulation signals (refer to APSKbased systems in [8], [9], [10], [11], [12], [13], [14] and referencestherein).

This paper is organized as follows: Section II discusses thebaseband communication system model, notion of Gaussianity, andtraditional blind equalizers. Section III describes the MED criteriafor channel equalization, discusses the admissibility of costs tailoredfor APSK, and stochastic gradient-based optimization. Section IVformulates the proposed adaptive MED-based blind equalizationscheme for APSK systems exploiting Newton-like update. SectionV provides a steady-state tracking performance of the proposedalgorithm in time varying scenario. Simulation results arediscussedin Section VI and conclusions are provided in Section VII. [h!]

II. SYSTEM MODEL AND TRADITIONAL BLIND EQUALIZERS

The baseband model for a typical complex-valued data communi-cation system, as shown in Fig. 1, consists of an unknown finite-impulse response filterhn, which represents the physical inter-connection between the transmitter and the receiver. A zero-mean,i.i.d., circularly symmetric, complex-valued data sequence {an} istransmitted through the channel, whose outputxn is recorded by thereceiver. The input/output relationship of the channel canbe writtenas:xn =

∑k an−khk + νn, where the additive noiseνn is assumed

to be stationary, Gaussian, and independent of the channel input an.The function of equalizer at the receiver is to estimate the delayed

version of original data,an−τ , from the received signalxn. Letwn = [wn,0, wn,1, · · · , wn,N−1]

T be a vector of equalizer coeffi-cients withN elements andxn = [xn, xn−1 · · · , xn−N+1]

T be thevector of channel observations (T denotes transpose operation). Theoutput of the equalizer is then given byyn = wH

n−1xn (H denotesthe Hermitian conjugate operator). Iftn = hn⊗w∗

n−1 represents theoverall channel-equalizer impulse response (⊗ denotes convolution),then

yn =∑

i

w∗n−1,ixn−i = tn,τ an−τ +

l 6=τ

tn,l an−l + ν′

n

︸ ︷︷ ︸signal + ISI + noise

(1)

which demonstrates the adverse effect of inter-symbol interference(ISI) and additive noise. The ISI is quantified as [15]:

ISI =

∑i |tn,i|2 −max

{|tn|2

}

max {|tn|2}(2)

In subsequent discussions we drop the subscriptn from t fornotational convenience. The idea behind aBussgangblind equalizeris to minimize (or maximize), through the choice ofw, a certain cost

2

function J depending onyn, such thatyn provides an estimate ofan up to some inherent indeterminacies, giving,yn = αan−τ , whereα = |α|eθ ∈ C represents an arbitrary complex-valued gain. Hence,a Bussgang blind equalizer tries to solve the following optimizationproblem:

w† = argw optimize J, with J = EJ (yn) (3)

The costJ is a function of implicitly embedded statistics ofyn andJ (·) is a real-valued function. Ideally, the costJ makes use of statis-tics which are significantly modified as the signal propagates throughthe channel, and the optimization of cost modifies the statistics of thesignal at the equalizer output, aligning them with those at channelinput. The equalization is accomplished when equalized sequenceynacquires an identical distribution as that of the channel input an.More formally, we have the following theorem [5]:

Theorem 1:If the transmitted signal is composed of non-Gaussiani.i.d. samples, both channel and equalizer are linear time-invariantfilters, noise is negligible, and the probability density functions (PDF)of transmitted and equalized signals are equal, then the channel hasbeen perfectly equalized.

This mathematical result is very important since it establishes thepossibility of obtaining an equalizer with the sole aid of signals statis-tical properties and without requiring any knowledge of thechannelimpulse response or training data sequence. Meanwhile, Donoho [4]noted that, as a consequence of the central limit theorem, linearcombinations of identically distributed random variablesbecomemore Gaussianthan the individual variables. Therefore, the receivedsignalxn will have a distribution that is more nearly Gaussian thanthe distribution ofan. Any suitable objective function capable ofmeasuringGaussianityor non-Gaussianitycan therefore be used fordeconvolution.

One of the measures of Gaussianity is (normalized) kurtosis, κ,which is a statistic based on second and fourth-order moments. Fora circularly-symmetric complex-valued random variableX, kurtosisis defined as [15]:

κX =E|x|4

(E|x|2)2− 2 (4)

κX is greater than zero, equal to zero, and less than zero for super-Gaussian, Gaussian, and sub-Gaussian random variables, respectively.Expectedly most of the existing blind equalizers use secondandfourth-order statistics of the equalized sequence. The first ever methodwhich relies on the aforesaid statistics is the constant moduluscriterion, [16]; it is given by

minw

{E|yn|4 − 2R2

E|yn|2}, (5)

where R is the Godard radiusand R2 = E|an|4/E|an|2 is theGodard dispersion constant[17]. For an input signal that has aconstant modulus|an| = R, the constant modulus criterion penalizesoutput samplesyn that do not have the desired constant moduluscharacteristics [18].

Shalvi and Weinstein [15] demonstrated that the condition of equal-ity between the PDF of the transmitted and equalized signals, due toTheorem1, was excessively tight. Under the similar assumptions,as laid in Theorem1, they discussed the possibility to performblind equalization by satisfying the conditionE|yn|2 = E|an|2 andensuring that a nonzero cumulant (of any order higher than 2)of an

andyn are equal. For a two dimensional signalan with four-quadrantsymmetry (i.e.,Ea2

n = 0), they suggested to maximize the followingexemplary cost containing second and fourth-order statistics:

maxw

∣∣∣∣E|yn|4 − 2

(E|yn|2

)2∣∣∣∣ s.t. E|yn|2 = E|an|2 (6)

In the next section, we discuss equalization/deconvolution techniqueswhich evolved around the notion ofentropyas a measure of Gaus-sianity.

III. M INIMUM ENTROPY DECONVOLUTION

In statistics, the most powerful test of the null hypothesisagainstanother is one which maximizes discrimination and minimizes theprobability of accepting the null hypothesis as true when actually thealternative is true. In information theory, a statistic that maximizesdetectability of a signal while minimizing the probabilityof a falsealarm is most powerful. In seismic community, however, a statistical(rank-discriminating) test against Gaussianity, which can be used todesign a deconvolution filter, is loosely termed as MED criterion [6],[4].

Hogg [19] developed a scale-invariant and the most powerfulrank-discriminating test for one member of the generalizedGaussianagainst another by considering the following PDF:

pY (y, α) =α

2Γ(

) exp (−|y|α) , |y| < ∞ (7)

whereα > 0 is shape parameter, andΓ(·) is the Gamma function.To determine if a random set of samples{y1, y2, · · · , yB} is drawnfrom the distributionpY (y, α2) as opposed topY (y,α1), a ratio testwas derived (based on the procedure described in [20]) as follows:

VY (α1, α2) =

(1B

∑Bi=1 |yi|α1

)B/α1

(1B

∑Bi=1 |yi|α2

)B/α2

α=α2

≷α=α1

χ (8)

whereχ is some threshold. The largerV becomes, the more probableit is that the sample setY is drawn from the distributionpY (y,α2)and notpY (y, α1). It would be interesting to look at the entropy,H(α), associated with (7); it is given as follows [21]:

H(α) = log

2

α

Γ

(1

α

)3/Γ

(3

α

)2

+1

α(9)

This function is illustrated in Fig. 2 for different membersofthe family. The entropy is maximum for Gaussian (whenα = 2).Note that(α < 2) and (α > 2) represents super-Gaussian and sub-Gaussian cases, respectively. Asα → 0, H(α) rapidly goes to−∞which is the entropy of the certain event; however, on the other side,H(α) falls off slowly to another minimum, which is the entropy ofthe uniform distribution.

From the work of Geary [22], it became known tothe statistical community that the test against Gaussianity1B

∑Bi=1 |xi|4/( 1

B

∑Bi=1 |xi|2)2 is most efficient when no

information on the distribution of the random sample is available.Remarkably, Wiggins exploited the same idea and sought todetermine the inverse channelw† that maximizes thekurtosisof the deconvolved seismic datayn [6]. Since seismic data aresuper-Gaussian in nature, givenB samples ofyn, Wiggins suggestedto maximize the following test (or cost):

JW(yn) =1B

∑Bn=1 |yn|4(

1B

∑Bn=1 |yn|2

)2large B−→ E|yn|4

(E|yn|2)2(10)

This scheme seeks the smallest number of large spikes (or impulses)consistent with the data, thus maximizing the order, or equivalently,minimizing theentropyor disorder in the data [23]. Wiggins coined

3

the term MED criterion for the test (10)1. Immediately after Wiggins,Ooe and Ulrych [24] realized that better deconvolution results may beobtained for super-Gaussian signals if non-Gaussianity ismaximizedwith someα less than two; they used(α = 1) and suggested

JOU(yn) =1B

∑Bn=1 |yn|2(

1B

∑Bn=1 |yn|

)2large B−→ E|yn|2

(E|yn|)2(11)

Later, Gray presented a generic MED criterion with two degrees offreedom [25]:

JG(yn) =1B

∑Bn=1 |yn|p(

1B

∑Bn=1 |yn|q

)p/qlarge B−→ E|yn|p

(E|yn|q)p/q(12)

Note that costs (10)-(12) are members of (8) for different values ofshape parameters. In the context of digital communication,where theunderlying distribution of the transmitted (possibly pulse amplitudemodulated) data symbols are closer to a uniform density (sub-Gaussian), we can obtain a blind equalizer by optimizing Gray’s cost(12) as follows [26], [27]:

w† = max

wJG(yn), for p = 2, and q > 2. (13)

Recently, Abrar and Nandi [28] discussed the case(p, q) = (2,∞)for the blind equalization of APSK signal:

JAN(yn) =1B

∑Bn=1 |yn|

2

(max {|yn|})2large B−→ E|yn|2

(max {|yn|})2(14)

Maximizing (14) can be interpreted as determining the equalizercoefficients, w, which drives the distribution of its output,yn,away from Gaussian distribution toward uniform, thus removingsuccessfully the interference from the received signal. Note that thecost (14) is an optimal, scale-invariant test for APSK signal againstGaussian (refer to for details).

A. Admissibility of the CostJAN(yn)

In this section, we discuss the admissibility ofJAN(yn) (14).By admissibility, we mean that the costJAN(yn) yields consistentestimates of the exact channel equalizer when the transmitted signalis i.i.d. or in other words the steady-state overall impulseresponse(t) is a delta function with arbitrary delay. Without loss of generality,we can assume that the channel, equalizer and transmitted signal arereal-valued. Note thatmax{|yn|} = max{|an|}

∑l |tl| < ∞ and,

owing to i.i.d. property,Ey2n = Ea2

n

∑l t

2l . Next we can express the

cost (14) int domain as follows (assuming largeB):

JAN(t) =

∑l |tl|2(∑l |tl|

)2 (15)

Evaluating the gradient w.r.t.k-th tap, we obtain

∂tkJAN =

(∑l |tl|

)2 ∂∑

jt2j

∂tk−(∑

l t2l

) ∂(∑

j|tj |)

2

∂tk(∑l |tl|

)4

=2(∑

l |tl|)2

tk − 2(∑

l t2l

)(∑j |tj |

)tk|tk|(∑

l |tl|)4

(16)

Equating the above to zero, we get|tk|∑

j |tj |−∑

l t2l = 0. It shows

that for a doubly infinite equalizer, the stable global maxima of JAN

1The simple structure of the impulsive function led Wiggins to call histechnique MED. According to [24], however, the approach which Wigginsactually adopts is not entropy; it is rather a variant ofvarimax rotation whichis widely used in obtaining a simple factor structure in factor analysis.

is along the axis, i.e.

{|tk|} = δk−k† , k† = · · · ,−1, 0, 1, · · · (17)

and unstable equilibria are along the diagonal, located at:{|tk|} =(1/L)

∑j∈IL

δk−j , whereIL(L ≥ 2) is any L-element subset ofthe integer set. The surface of cost (15) is depicted in Fig. 3(a) fora two-tap scenario; it can be seen that the cost is maximized onlyfor the solution specified in (17). Next, incorporating thea priorisignal knowledgeγ := max {|an|}, the cost (14) can be written ina constrainedform as follows:

w† = argmax

wE|yn|2 s.t. max {|yn|} ≤ γ. (18)

The geometry of the cost (18) is depicted in Fig. 3(b) for a two-tapscenario; it can be seen that the cost is maximized when the twoballs,

∑i |ti|2 and

∑i |ti|, coincide. The cost (18) is quadratic, and

the feasible region (constraint) is a convex set. The problem, however,is non-convex and may have multiple local maxima. Nevertheless, wehave the following theorem (refer to [29] for proof):

Theorem 2:Assumew† is a local optimum in

w† = argmax

wE|yn|2 s.t. max {|yn|} ≤ γ

andt† is the corresponding total channel equalizer impulse-responseand channel noise is negligible. Then it holds{|tk|} = δk−k† , wherek† = · · · ,−1, 0, 1, · · ·.

Thus an equalizer which is maximizing the output energy andconstraining the largest amplitude is able to mitigate the ISI inducedby the channel.

B. Stochastic Gradient-Based Optimization ofJAN

When an equalizerw optimizes a costJ = EJ (yn) by stochasticgradient-based adaptive method, the resulting algorithm is wn =wn−1 ± µ∂J /∂w∗

n−1, whereµ > 0 is the step-size, governingthe speed of convergence and the level of steady-state performance,[30]. The positive and negative signs are for maximization andminimization, respectively.

Note that a straightforward gradient-based adaptive implementationof JAN is not possible. The reason is that the order statisticmax {|yn|} is not a differentiable quantity. In [28], however, authorspresented an instantaneous and differentiable version of constrainedJAN and obtained the following stochastic gradient-based algorithm:

wn = wn−1 + µ fn y∗n xn,

fn =

{1, if |yn| < γ,

−β, if |yn| ≥ γ.

(19)

whereβ is a constant defined as follows (refer to ):

β :=E|yn|2{|yn|<γ}

E|yn|2{|yn|≥γ}

(20)

The algorithm (19) was termed asbeta constant modulus algorithm(βCMA). TheβCMA may be kept stable in mean-square sense if itsstep-size satisfies the condition0 < µ < 3/(2βtr(R)), wheretr(·)is trace operator andR = Exnx

Hn is autocorrelation matrix [31].

IV. A DAPTIVE NEWTON-LIKE OPTIMIZATION OF JAN

In this section, we aim to optimizeJAN using Newton-likeadaptive method. When Newton-like optimization is used by theequalizer, the update for a minimization scenario is given as follows[32]:

wn = wn−1 − µ

(∂2J

∂wn−1∂wHn−1

)−1(∂J

∂wn−1

)∗(21)

4

Firstly, to simplify algebraic manipulation, we suggest the followinginstantaneouscost function:

w† = argmin

wJn, where Jn :=

∣∣fn∣∣ ∣∣|yn|2 − γ2

∣∣ (22)

wherefn is as specified in (19). It is simple to show that solving (22)by gradient-based method results inβCMA. For the formulation ofa Newton-like scheme, we introduceexponential weights(memory)in (22) as follows:

w† = argmin

wJ ,

where J :=n∑

k=0

λn−k∣∣fk∣∣ ·∣∣∣∣∣∣wH

n−1xk︸ ︷︷ ︸=:yk

∣∣2 − γ2

∣∣∣∣(23)

and λ is the forgetting factor. Whenλ equals 1, we have a costwhich may be considered as a sort of (infinite memory) least-squaresproblem. For0 < λ < 1, however, the cost has effective finitememory (1/(1− λ)) and it may be optimized adaptively using (21).We readily evaluate gradientgn and HessianHn for (23) as follows:

gn :=∂J

∂wn−1= −

n∑

k=0

fkykx∗kλ

n−k,

Hn :=∂2J

∂wn−1∂wHn−1

= −n∑

k=0

fkxkxHk λn−k

(24)

Here, we encounter a problem. Note that, for the required steady-state condition|yk| ≤ γ implying fk = +1∀k, however, it leads toan undesirable situation

Hn = − 1

1− λRn 4 0 (25)

where Rn :=∑n

k=0 xkxHk < 0. So, assuming a converging

equalizer, the Hessian is found to be negative definite. It meansthat such an equalizer will try to maximize the cost instead ofminimizing it. This is contrary to the problem definition andweconclude thata straight Newton-likeβCMA equalizer, implementingHessian as specified in (24), will diverge. Simulation study is foundin conformation with this argument. One possible way to resolve thismatter is to ensure that the recursively computed Hessian remainspositive definite. Consider the following solution:

Hn =

n∑

k=0

|fk|xkxHk λn−k = λHn−1 + |fn|xnx

Hn (26)

Invoking the matrix inversion lemma, we obtain:

P n = H−1n =

1

λ

(P n−1 −

P n−1xnxHn PH

n−1

λzn + xHn P n−1xn

)(27)

where zn = |fn|−1. It was shown in [33], that the performanceof a Newton-like algorithm may be enhanced ifzn is computedas an iterative estimate, that iszn ≡ zn = 〈|fn|−1〉 where 〈·〉represents some averaged estimate of the enclosed entity. One ofthe possibilities iszn = zn−1 +

1n+1

(|fn|−1 − zn−1

). Further note

that gn = λgn−1 + fnynx∗n. For λ close to one, the vectorgn will

be dominated by its former estimategn−1, that isgn ≈ gn−1, andthis leads to a simpler expressiongn ≈ (1−λ)−1fnynx

∗n. With this

consideration, the proposed Newton-likeβCMA (NL-βCMA) updateis given as follows:

wn = wn−1 + µPnfny∗nxn, (28)

whereP n is as evaluated in (27) and the factor(1 − λ)−1 in gn

is merged withµ. Note that the recursive calculation ofPn keepsthe computational complexity of the proposed scheme toO(N2) periteration. The proposed scheme is summarized in Table. I.

V. STEADY-STATE PERFORMANCE OFNL-βCMA

In this section, we study the steady-state tracking performance ofNL-βCMA in a non-stationary environment. Although the steady-state performance essentially corresponds to only one point on thelearning curve of the adaptive filter, there are many situations wherethis information is of value by itself. The addressed approach is basedon studying the energy flow through each iteration of an adaptivefilter [34], [35], and it relies on a fundamental error variance relationthat avoids the weight-error variance recursion altogether. We mayremark that although we focus on the steady-state performance ofNL-βCMA, the same approach can also be used to study the transient(i.e., convergence and stability) behaviour of this filter.These detailswill be provided elsewhere.

Consider a non-stationary environment with time-varying optimalweight-vectorwo

n, also called Wiener filter, given by

won = w

on−1 + qn (29)

whereqn is some random perturbation such thatEqnqHn = Q =

σ2qI. Using the Wiener solution, the dataan can be expressed as

an = (won−1)

Hxn + vn wherevn is disturbance and is uncorrelatedwith xn, (Ev∗nxn = 0) [36]. The purpose of the tracking analysis ofan adaptive filter is to study its ability to track such time variations.The weight error vectorwn which quantifies how far away the weightvector is from Wiener solution, is given as

wn = won −wn (30)

The so-calleda posteriori and a priori error are defined asep,n =(wn − qn)

Hxn and ea,n = w

Hn−1xn, respectively. First consider

a generic update expressionwn = wn−1 + µP nϕ∗nxn. Subtracting

won from both sides of this update, substituting its value from (29)

and exploiting (30), we get

wn = wn−1 − µP nϕ∗nxn + qn (31)

Taking Hermitian transpose of (31) and post-multiplying byxn, weobtain:

(wn − qn)Hxn = w

Hn−1xn − µxH

n P nxnϕn (32a)

⇒ ep,n = ea,n − µϕn‖xn‖2Pn(32b)

where‖xn‖2Pn:= xH

n P nxn is the weighted Euclidian norm. From(32b), we haveϕn = µ−1‖xn‖−2

Pn(ea,n − ep,n) ,xn 6= 0. Now,

substituting in the value ofϕn in (31) we get

wn − qn = wn−1 − µnP n(ea,n − ep,n)∗xn (33)

where µn :=(‖xn‖2P n

)+is used to define pseudo-inverse of

‖xn‖2P n. TakingP−1

n weighted norm on both sides and simplifying,we get‖wn−qn‖2P−1

non the left and following quantity on the right

‖wn−1‖2P

−1

n− µn|ea,n|2 + µn|ep,n|2. This results in the energy

conservation relation and it is summarized as

‖wn − qn‖2P−1

n+ µn|ea,n|2 = ‖wn−1‖2

P−1

n+ µn|ep,n|2 (34)

Examining the expectation of the left most term of the energyrelation, we have

E‖wn − qn‖2P−1

n

= E‖wn‖2P

−1

n+ E‖qn‖2P−1

n− 2ℜEqH

n P−1n wn

= E‖wn‖2P

−1

n+ E‖qn‖2P−1

n− 2ℜ

(Eq

Hn P

−1n wn−1

+E‖qn‖2P−1

n− Eq

Hn xnϕ

∗n

)

= E‖wn‖2P

−1

n− E‖qn‖2P−1

n

(35)

where the vanished terms are a result of the following assumptions

5

about the random-walk driving-noise sequence{qn}.

A.1) qn is i.i.d. andEqn = 0

Q = EqnqHn = σ2

qI ≻ 0

{qn} ⊥ {xn}, {qn} ⊥ {ϕn}

Using (35) with (34), and noting that in steady stateE‖wn‖2P

−1

n=

E‖wn−1‖2P

−1

n, we obtain

µnE|ea,n|2 = E‖qn‖2P−1

n+ µnE|ep,n|2 (36)

This equation can now be solved for the steady-state excess mean-square-error (EMSE), which is defined by

ζ := limn→∞

E|ea,n|2 (37)

We emphasize that (36) is an exact relation that holds without anyapproximation or assumption, except for the assumption that the filteris in steady state. The procedure of finding the EMSE through (36)avoids the need for evaluatingE‖wn‖2 or its steady-state valueE‖w∞‖2. To proceed further, we make the following assumptions(refer to [37] for similar treatment):

A.2) EP nxnxHn = EP nExnx

Hn (holds in steady state asP n

can be replaced byEP n, see [38])A.3) E‖xn‖2P n

|ea,n|2 = E‖xn‖2PnE|ea,n|2 (see [38])

A.4) limn→∞ EH−1n = (limn→∞ EHn)

−1 (justified for com-plex Wishart distributionin [30] and shown to be reasonablevia simulations in [38])

From (32), we haveep,n = ea,n − µϕn‖xn‖2Pn, substituting it into

(36), we obtain

2Eℜ[e∗a,nϕn

]= µE‖xn‖2Pn

|ϕn|2 + µ−1E‖qn‖2P−1

n(38)

To solve the right hand side (RHS) of (38), we employ

A.5) E‖xn‖2P n|ϕn|2 = E‖xn‖2Pn

E|ϕn|2 (separation principle[38])

First, let us solveE‖xn‖2Pnas

E‖xn‖2P n= tr(EP nxnx

Hn ) = tr(EP nExnx

Hn ) = tr(EP nR)

(39)HereEP n requires further investigation. It is observed that

EHn =n∑

i=1

λn−iE|fn|xnx

Hn + ελn

I

= E|fn|ExnxHn

n∑

i=1

λn−i + ελnI

= ρR1− λn

1− λ+ ελn

I for λ < 1

(40)

whereρ := E|fn|. Further, exploiting A.4, we get

limn→∞

EP n ≈(lim

n→∞EHn

)−1

=1

ρ(1− λ)R−1 for λ < 1 (41)

by using (41) with (39), the following equation results

E‖xn‖2Pn= tr

(1

ρ(1− λ)R−1

R

)

=1

ρ(1− λ)tr(I) =

1

ρ(1− λ)N.

(42)

Proceeding to the second term on the RHS of (38), we get

E‖qn‖2P−1

n= Eq

Hn Hnqn = tr(Eqnq

Hn Hn)

= tr(ρEqnqHn

n∑

i=1

λn−iExnx

Hn + ελn

EqnqHn )

= ρ tr(QR)

n∑

i=1

λn−i + ελntr(Q)

= ρ1− λn

1− λtr(QR) + ελntr(Q) for λ < 1

(43)

so that

limn→∞

E‖qn‖2P−1

n= ρ(1− λ)−1tr(QR) for λ < 1 (44)

Using (42), and (44) the RHS of (38) becomes

RHS=µ

ρN(1− λ)E|ϕn|2 +

ρ

µ(1− λ)−1tr(QR) (45)

The parameterρ is computed as follows:

ρ := E |fn| = Pr [|yn| < γ] + β Pr [|yn| > γ]

= 1 + (β − 1) Pr [|yn| > γ]

= 1 + (β − 1)L∑

j=1

Mj

M

∫ ∞

γ

p(|yn|;Rj) d|yn|

= 1 + (β − 1)EQ1,0

(|an|σ

σ

).

(46)

whereQm,v(a, b) is Nuttall Q-function [39] as defined below:

Qm,v(a, b) :=

∫ ∞

b

xme−1

2(x2+a2)Iv (ax)dx, a, b > 0 (47)

wherem > 1, v ≥ 0 and Iv(·) is the vth-order modified Besselfunction of first kind. Now to solveE|ϕn|2, whereϕn = fnyn, wesee thatfn is a piecewise function and it has got a discontinuity at|yn| = γ. We start as

E|ϕn|2 = Ef2n |yn|2

= E|yn|2{0<|yn|<γ} + β2E|yn|2{γ≤|yn|<∞}

= E|yn|2 +(β2 − 1

)E|yn|2{γ≤|yn|<∞}︸ ︷︷ ︸

=:A

(48)

where the factorA is expressed and evaluated as follows:

A := E|yn|2{γ≤|yn|<∞} =

L∑

j=1

Mj

M

∫ ∞

γ

|yn|2p(|yn|;Rj) d|yn|

=L∑

j=1

Mj

M

∫ ∞

γ

|yn|3σ2

exp

(−|yn|2 +R2

j

2σ2

)I0

(|yn|Rj

σ2

)d|yn|

= σ2L∑

j=1

Mj

MQ3,0

(Rj

σ,γ

σ

)= σ2

EQ3,0

(|an|σ

σ

).

(49)To solve (48), we need to calculate statistical moments of the modulus|yn|. Next we compute the left hand side (LHS) of (38),

LHS = 2Eℜ[e∗a,n ϕ] = 2Eℜ[(an − yn)∗ fn yn]

= 2Eℜ[fn a∗nyn]− 2Efn |yn|2

= 2

(Efn |an|2︸ ︷︷ ︸

=:B

−Eℜ[fn a∗nea,n]︸ ︷︷ ︸

=:C

−Efn |yn|2︸ ︷︷ ︸=:D

) (50)

The termB is computed as follows:

B = E|an|2{0<|yn|<γ} − β E|an|2{γ≤|yn|<∞}

= E|an|2 − (β + 1)E|an|2Q1,0

(|an|σ

σ

),

(51)

6

Next, the termD is computed as follows:

D = E|yn|2{0<|yn|<γ} − β E|yn|2{γ≤|yn|<∞}

= E|yn|2 − (β + 1)E|yn|2{γ≤|yn|<∞}︸ ︷︷ ︸=:A

(52)

whereA is as obtained in (49). Next the termC is computed:

C = Eℜ [a∗nea,n]{0<|yn|<γ} − β Eℜ [a∗

nea,n]{γ≤|yn|<∞}

= Eℜ [a∗nea,n]︸ ︷︷ ︸

=0

− (β + 1)Eℜ [a∗nea,n]{γ≤|yn|<∞}

= −(β + 1)

{E

[|an|2 + ζ

2Q1,0

(|an|σ

σ

)]

−σ2

2EQ3,0

(|an|σ

σ

)}.

(53)

Combining (45)-(53), we obtain the following expression tosolve forEMSE of the proposed NL-βCMA, ζNL.βCMA, as follows:2

ζ

(1− µ

ρN(1− λ)(β − 1)

)EQ3,0

(√2

ζ|an|,

√2

ζγ

)

− 2

1 + β

(ρ tr(QR)

µ(1− λ)+

µ

ρN(1− λ)(ζ + E|an|2) + 2ζ

)

+2E(ζ − |an|2

)Q1,0

(√2

ζ|an|,

√2

ζγ

)= 0.

(54)

whereρ is as specified in (46).

VI. SIMULATION RESULTS

In [28], the βCMA has already been compared and shown to bebetter than four existing algorithms which include traditional CMA[16] and three of its variants: the (unnormalized) relaxed CMA(RCMA) [40], Shtrom-Fan algorithm (SFA) [41] and the generalizedCMA (GCMA) [42]. In this work, we provide performance compar-ison of NL-βCMA (as summarized in Table. I) with Newton-likeCMA (NL-CMA) [32] and recursive least square CMA (RLS-CMA)[43]. The performances of CMA andβCMA are shown as standardbenchmark. Moreover, we consider an unorthodox benchmark [44]which is an adaptive method for blind equalization and relies onexplicit estimation of PDF usingParzen window and is termedstochastic gradient algorithm (SQD). For reference CMA,βCMA,SQD, RLS-CMA and NL-CMA are summarized in Table II.

Here we would like to highlight the following important implemen-tation details: First, for the SQD scheme the required compensationfactor F (σ) is computed numerically (see [44] for details). Thecompensation factorF (σ) is a function of the constellation schemeand hence we find the values ofF (σ) for 16APSK(8,8) usingbisection method (see Fig. 4). The authors of [44] highlightthat theuse of a smallerσ (e.g., 1) results in increased number of localminima of the cost function. It is for this reason that we use ahighervalue ofσ, (i.e.,σ = 15) in all simulations (the correspondingF (σ)is 1.325 as shown in Fig. 4). Secondly, to ensure the stability ofNL-CMA, we slightly modify it. According to Mirandaet al. [45]one must check the reliability of the error quantity in Hessian. If thestatistic of error quantity in Hessian is not reliable, one must adhereto simple autocorrelation matrix. Incorporating this recommendationand owing to sub-Gaussian nature of the transmitted signal,the

2We have observed that the traditional root-finding methods like bisectionor line-search are good enough for solving (54) and secondly, possibly dueto the monotonicity of NuttallQ-function, the expression (54) yielded uniquesolutions in all simulation examples considered in this work.

Channelhn

Equalizerwn

DecisionDevice

BlindAlgorithm

vn

+xnan yn an

Fig. 1: A typical baseband communication system.

TABLE I: NL-βCMA

w0 = [01×(N−1)/2, 1, 01×(N−1)/2]T , β > 1,

P 0 = εIN×N , I is identity matrix and0 < ε ≪ 1

fn =

{1, if |yn| < γ

−β, if |yn| ≥ γ.

zn = zn−1 +1

n+1

(|fn|−1 − zn−1

)

P n =1

λ

(P n−1 −

P n−1xnxHn PH

n−1

λ zn + xHn P n−1xn

)

wn = wn−1 + µP nfny∗nxn

TABLE II: Summary of Compared Algorithms

(a) CMA

fn = R2 − |yn|2

wn = wn−1 + µfny∗nxn

(b) βCMA

fn =

{1, if |yn| < γ

−β, if |yn| ≥ γ.

wn = wn−1 + µfny∗nxn

(c) SQD

K ′σ(x) = − x√

2πσ3exp(−x2/2σ2)

F (σ): compensation factor

Ns: Size of alphabet{a}

di = (|yn|2 − F (σ)|ai|2)

∇wJ (w) =1

Ns

∑Nsi=1 K

′σ(di)y

∗nxn

wn = wn−1 − µ∇wJ (w)

(d) RLS-CMA

P 0 = εIN×N , 0 < ε ≪ 1

zn = xny∗n

gn = P n−1zn/(λ+ zHn Pn−1zn)

Pn = (P n−1 − gnzHn P n−1)/λ

fn = R2 − |yn|2

wn = wn−1 + µgnfn

(e) NL-CMA

P 0 = εIN×N , 0 < ε ≪ 1

fn = R2 − |yn|2

zn =

{(2|yn|2 − R2)−1 if (2|yn|2 − R2)−1 ≥ 0

1 otherwise.

Pn =1

λ

(

P n−1 −P n−1xnx

Hn PH

n−1

λ zn + xHn Pn−1xn

)

wn = wn−1 + µ(1 − λn)P nfny∗nxn

7

0 1 2 3 4 5 60

0.5

1

1.5

α

Ent

ropy

[in

nats

]

Fig. 2: EntropyH(α) versus shape parameterα. Forα = 2, entropyis maximum and its value islog(

√2πe).

t0

t 1

−1 0 1−1

−0.5

0

0.5

1

t02+t

12

|t0|+|t

1|

−10

1

−10

1

0.6

0.8

1

t0

Σiti2/(Σ

j|t

j|)2

t1

(b)(a)

Fig. 3: (a): The surface of costJAN(t) for a two-tap equalizer and(b): coincidence of unit balls.

stability of NL-CMA requires to use

zn =

{(2|yn|2 −R2)−1, if (2|yn|2 −R2)−1 ≥ 0

1, otherwise.

0 5 10 151

1.1

1.2

1.3

1.4

Kernel Size (σ)

F(σ

)

16(8,8)APSK

Fig. 4: Numerically computedF (σ) for 16APSK(8,8)

For simulation purposes, equalizer length is set toN = 7 (un-less otherwise noted) andw0 = [01×⌊(N−1)/2⌋, 1, 01×⌈(N−1)/2⌉]

T

i.e., center tap initialization is used. The data is modulated using8APSK(4,4) and 16APSK(8,8) shown in Fig. 5(b) and 5(c). For allexperiments, the value ofβ is obtained using (68) (this value is1.9319 for 8APSK(4,4) and3 for 16APSK(8,8)). ISI (as definedin (2)) is used as index to evaluate the performance of differentequalization schemes. Experiments comparing the proposedschemeto existing equalization methods are discussed in VI-A and VI-B.Furthermore, additional experiments are carried to compare thetheoretical and practical EMSE of the proposed schemes for both8APSK(4,4) and 16APSK(8,8). The details of these experiments arediscussed in VI-C.

aI

γ aR

1

1.9323

1.586

a) b) c)

Fig. 5: a) A hypothetical dense APSK, b) practical 8APSK(4,4), andc) practical 16APSK(8,8).

A. Experiment 1

In this experiment the performance of the proposed scheme istested for a channel with low eigenvalue spread. The signal ismodulated using 16APSK(8,8) and a complex-valued channel withcoefficients hR = [−0.005, 0.009, −0.024, 0.854,−0.218,0.049,−0.016] and hI = [−0.004, 0.03, −0.104, 0.52,0.273,−0.074, 0.02], where h = hR + hI is used [46]. Theeigenvalue spread of the channel is5.83. The signal-to-noise ratio(SNR) is set to30 dB and stationary environment is assumed (i.e.,σq = 0, where σq is the standard deviation of the perturbationqn in (29)). Learning curves are obtained for7, 000 iterations andare ensemble averaged over200 independent runs. The steps sizesand forgetting factors of all schemes are adjusted for comparableperformances (i.e., the same steady state ISI) and are mentioned inFig. 6. It can be observed from Fig. 6 that NL-βCMA shows thefastest convergence rate followed byβCMA.

0 1000 2000 3000 4000 5000 6000 7000−28

−23

−18

−13

−8

ISI [

dB]

Iterations

CMA: µ = 5e−5βCMA: µ =43e−5RLS−CMA µ= 8e−2, λ = 0.98NL−βCMA: µ = 14e−2, λ = 0.98SQD: µ = 18e−5NL−CMA: µ = 65e−3, λ = 0.98

βCMA

NL−CMA

SQD

CMA

NL−βCMA

RLS−CMA

Fig. 6: Learning curves for channel with low eigenvalue spread:N = 7, 16APSK(8,8)

B. Experiment 2

In this experiment, a channel with higher eigenvalue spreadisconsidered. Generally, an increase in the eigenvalue spread of thechannel results in poor performance of equalization schemes. Forthis experiment, 16APSK(8,8) modulated signal is passed througha channel with impulse responseh = [0, 0.1, 0.4, 0.8, 0.4, 0.1,0] (the eigenvalue spread of this channel is calculated to be65.28).Again, SNR is set to30 dB and stationary environment is assumed.The step sizes and forgetting factors of all schemes are adjusted forcomparable performances and are summarized in Fig. 7. The learningcurves are obtained for40, 000 iterations and are ensemble averagedover 200 independent runs. It can be observed from Fig. 7 that the

8

performance of SQD, CMA andβCMA degrades significantly incomparison with the first experiment. However, it should be notedthat NL-CMA, RLS-CMA and NL-βCMA converge relatively faster.Further, among the three fast converging schemes, NL-βCMA showsthe fastest convergence rate.

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

−25

−20

−15

−10

−5

0

ISI [

dB]

Iterations

CMA: µ = 1e−4βCMA: µ = 48e−5RLS−CMA µ= 14e−2, λ = 0.98NL−βCMA: µ = 16e−2, λ = 0.98SQD: µ = 33e−5NL−CMA: µ = 12e−2, λ = 0.98

CMA

SQD

βCMANL−CMA

NL−βCMA

RLS−CMA

Fig. 7: Learning curves for channel with high eigenvalue spread:N = 7, 16APSK(8,8)

C. Experiment 3

In this experiment, analytical and empirical performance of the pro-posed scheme is compared in non-stationary environment. Insteadystate, we assume that the proposed scheme converges in the mean toa zero forcing solution, i.e., the mean of combined channel-equalizerresponse converges to a delta function with arbitrary delayand phaserotation. We consider a non-stationary channel withσq = 10−3,whereσq is the standard deviation of the perturbationqn in (29).The equalizer length is set toN = 11 and the steady state EMSEis evaluated for20, 000 symbols in a noise-free environment. Theforgetting factor is set to0.98 and step size is varied to obtain thedesired curves for both 8APSK(4,4) and 16APSK(8,8). A closematchbetween the analytic and measured EMSE for both constellation typesis observed as depicted in Fig. 8.

10−3

10−2

10−1

100

−30

−25

−20

−15

−10

−5

0

Step size: µ

EM

SE

: ζ [d

B]

SimulationTheory

16APSK(8,8)

8APSK(4,4)

Fig. 8: Steady state EMSE using 8APSK(4,4) and 16APSK(8,8).

VII. C ONCLUSION

Exploiting the notion of minimum entropy deconvolution, a costfunction was specifically derived for the blind channel equalizationof amplitude phase shift keying signal in digital communicationsystems. The cost was optimized adaptively using Newton’s method

and it yielded a Newton-like constant modulus algorithm NL-βCMA.Tracking analysis of the proposed algorithm was performed based onSayed-Rupp’s feedback approach. Simulation results demonstratedsignificant improvement in performance compared with conventionalalgorithms. Further, observations of excess mean square error demon-strated good agreement between theoretical and practical findings.

−2

0

2

−2

−1

0

1

20

0.2

0.4

0.6

0.8

1

−2

0

2

−2

−1

0

1

20

0.2

0.4

0.6

0.8

1

a) b)

Fig. 9: PDFs (not to scale) of a) hypothetical (dense) APSK and b)Gaussian distributed received signal.

APPENDIX A: DERIVATION OF JAN(yn)

Consider a continuous APSK signal, where signal alphabetsy ∈A = {aR + aI} are uniformly distributed (theoretically) over acircular region of radiusγ, with the center at the origin. The jointPDF of aR andaI is given by (refer to Fig. 9(a))

pA(y) =

1

πγ2,√

a2R + a2

I = |y| ≤ γ,

0, otherwise.(55)

Now consider the transformationY = |y| =√

a2R + a2

I andΘ = ∠(aR, aI), where Y is the modulus and∠(i, j) denotesthe angle in the range(0, 2π) that is defined by the point(i, j).The joint distribution of the modulusY and Θ can be obtainedas p

Y,Θ(y, θ) = y/(πγ2), y ≥ 0, 0 ≤ θ < 2π. Since Y and

Θ are independent, we obtainpY(y : H0) = 2y/γ2, y ≥ 0,

whereH0 denotes the hypothesis that signal is distortion-free. LetY1, Y2, · · · , YB be a sequence, of sizeB, obtained by takingmodulus of randomly generated distortion-free signal alphabetsA,where subscriptn indicates discrete time index. LetZ1,Z2, · · · ,ZB

be the order statistic of sequence{Y}. Let pY(y1, y2, · · · , yB : H0)

be an B-variate density of the continuous type, then, under thehypothesisH0, we obtain

pY(y1, · · · , yB : H0) =2B

γ2B

B∏

k=1

yk. (56)

Next we find scale-invariant PDFpsiY(y1, y2, · · · , yB : H0) for givenB realizations ofy as follows (belowα is some positive scale):

psiY(y1, · · · , yB : H0) =

∫ ∞

0

pY(αy1, · · · , αyB : H0)α

B−1dα

=2B

γ2B

B∏

k=1

yk

∫ Ra/(zB−z1)

0

α2B−1 dα

=2B−1

B (zB − z1)2B

B∏

k=1

yk,

(57)where z1, z2, · · · , zB are the order statistic of elementsy1, y2, · · · , yB , so that z1 = min{y} and zB = max{y}.Now consider the next hypothesis (H1) that signal suffers frommulti-path interference as well as with additive Gaussian noise (refer

9

to Fig. 9(b)). Thus, the in-phase and quadrature componentsof thereceived signal are modeled as normal-distributed; owing to centrallimit theorem, it is theoretically justified. It means that the modulusof the received signal follows Rayleigh distribution,

pY(y : H1) =

y

σ2y

exp

(− y2

2σ2y

), y ≥ 0, σy > 0. (58)

The B-variate densitiespY(y1, · · · , yB : H1) and psi

Y(y1, · · · , yB :

H1) are obtained as

pY(y1, · · · , yB : H1) =

1

σ2By

B∏

k=1

yk exp

(− y2

k

2σ2y

), (59a)

psiY(y1, · · · , yB : H1) =

1

σ2By

B∏

k=1

yk

×∫ ∞

0

exp

(−α2

∑Bk′=1 y

2k′

2σ2y

)α2B−1dα. (59b)

Substitutingu = 12α2σ−2

y

∑Bk′=1 y

2k′ , we obtain

psiY(y1, · · · , yB : H1) =

2BΓ(B)∏B

k=1 yk

2(∑B

k=1 y2k

)B (60)

The scale-invariant uniformly most powerful test ofpsiY(y1, · · · , yB :

H0) againstpsiY(y1, · · · , yB : H1) provides us, see [20]:

O(y1) =psiY(y1, · · · , yB : H0)

psiY(y1, · · · , yB : H1)

=1

B!

[ ∑Bk=1 y

2k

(zB − z1)2

]B

=1

B!

[ ∑Bn=1 |yn|2

(max{|yn|} −min{|yn|})2

]B (61)

In the present context, whereyn is the deconvolved sequence, wehave min{|yn|} = 0. Further takingBth root of (61), ignoringconstants and some manipulations, we get (14).

APPENDIX B: EVALUATION OF β FOR APSK

Expressingyn ≈ an − ea,n, note that the amplitude|an| isperturbed byea,n; sinceea,n is assumed to be zero-mean (complex-valued) narrowband Gaussian, the modulus|yn| becomes Riciandistributed and its PDF (conditioned on|an|) can be expressed as

p(|yn|; |an|) =|yn|σ2

exp

(−|yn|2 + |an|2

2σ2

)I0

(|yn| |an|

σ2

)(62)

whereσ2 := E(ℜ[ea])2 = E(ℑ[ea])2, andI0 is zeroth order modifiedBessel function of first kind (whereℜ[·] and ℑ[·] refer to real andimaginary components of the enclosed complex-valued entity). Using(62), akth-order moment of modulus|yn| can be computed as:

E|yn|k =L∑

j=1

Mj

M

∫ ∞

0

|yn|kp(|yn|;Rj)d|yn| (63)

Consider a distortion-freeM -symbol complex-valued constellation{a} which comprisesL number of unique moduli, that is|an| ∈{R1, R2, · · · , RL}, satisfying0 < R1 < R2 < · · · < RL = γ.Now let Mj be the number of unique symbols on thejth modulusRj , this impliesE|an|2 = (1/M)

∑Lj=1 MjR

2j is the per-symbol

average energy of constellation{a}.

β :=E|yn|2{|yn|<γ}

E|yn|2{|yn|≥γ}

=E|yn|2 − E|yn|2{|yn|≥γ}

E|yn|2{|yn|≥γ}

(64)

Using (62), we obtainE|yn|2 = 2σ2 + E|an|2 andE|yn|2{|yn|≥γ} =

σ2EQ3,0(|an|/σ, γ/σ), whereQm,v(a, b) is Nuttall Q-function as

defined below [39]:

Qm,v(a, b) :=

∫ ∞

b

xme−1

2(x2+a2)Iv (ax)dx, a, b > 0 (65)

for v ≥ 0, m ≥ 1 andIv(·) is thevth-order modified Bessel functionof first kind. An approximate expression forQ3,0(a, b) is given asfollows [47]:

Q3,0(a, b) ≈2 + (a+ b)2 + b2√8πab

exp

(− (b− a)2

2

)

+a(3 + a2) + b(1 + a2)

4√ab

erfc

(b− a√

2

).

(66)

Exploiting (66), we can evaluateβ under the limitσ → 0. De-noting z = (γ − |an|)/(

√2σ), note that forγ > |an|, we have

limσ→0 exp(−z2

)= limσ→0 erfc (z) → 0. And for γ = |an|, we

havelimσ→0 exp(−z2

)= limσ→0 erfc (z) → 1. So it is simple to

show that:

limσ→0

E|yn|2{|yn|≥γ} = limσ→0

σ2EQ3,0

(|an|σ

σ

)→ MLγ

2

2M. (67)

Finally, the asymptotic value ofβ for APSK signal under thecondition of vanishing convolutional noise of variance2σ2 is givenby

limσ→0

β → (2ME|an|2 −MLγ2)

/(MLγ

2). (68)

REFERENCES

[1] S. Haykin. Blind deconvolution. Prentice Hall, 1994.[2] C.R. Johnson, Jr., P. Schniter, T.J. Endres, J.D. Behm, D.R. Brown, and

R.A. Casas. Blind equalization using the constant modulus criterion: Areview. Proc. IEEE, 86(10):1927–1950, 1998.

[3] Z. Ding and Y. Li. Blind Equalization and Identification. Marcel DekkerInc., New York, 2001.

[4] D. Donoho. On minimum entropy deconvolution.Proc. 2nd AppliedTime Series Symp., pages 565–608, Mar. 1980.

[5] A. Benveniste, M. Goursat, and G. Ruget. Robust identification of anonminimum phase system: Blind adjustment of a linear equalizer indata communication.IEEE Trans. Automat. Contr., 25(3):385–399, 1980.

[6] R.A. Wiggins. Minimum entropy deconvolution. Proc. Int. Symp.Computer Aided Seismic Analysis and Discrimination, 1977.

[7] R.A. Wiggins. Minimum entropy deconvolution.Geoexploration, 16:21–35, 1978.

[8] A. Morello and V. Mignone. DVB-S2: the second generationstandardfor satellite broad-band services.IEEE Proc., 94(1):210–227, 2006.

[9] R. De Gaudenzi, A. Guillen i Fabregas, and A. Martinez.Turbo-codedAPSK modulations design for satellite broadband communications. Intl.Jnl. Sat. Commun. Net., 24(4):261–281, 2006.

[10] C.-Y. Kao, M.-C. Tseng, and C.-Y. Chen. The performanceanalysisof backward compatible modulation with higher spectrum efficiency forDAB Eureka 147.IEEE Trans. Broadcasting, 54(1):62–69, 2008.

[11] C. Shaw and M. Rice. Turbo-coded APSK for aeronautical telemetry.IEEE Aerospace and Electronic Systems Magazine, 25(4):37–43, 2010.

[12] Z. Liu, Q. Xie, K. Peng, and Z. Yang. APSK constellation with Graymapping. IEEE Commun. Lett., 15(12):1271–1273, 2011.

[13] H. Wang, Y. Li, X. Yi, D. Kong, J. Wu, and J. Lin. APSK modulatedCO-OFDM system with increased tolerance toward fiber nonlinearity.IEEE Photonics Tech. Lett., 24(13):1085–1087, 2012.

[14] S. Fan, H. Wang, Y. Li, W. Du, X. Zhang, J. Wu, and J. Lin. Optimal 16-ary APSK encoded coherent optical OFDM for long-haul transmission.IEEE Photonics Tech. Lett., 25(13):1199–1202, 2013.

[15] O. Shalvi and E. Weinstein. New criteria for blind equalization of non-minimum phase systems.IEEE Trans. Inf. Theory, 36(2):312–321, 1990.

[16] D.N. Godard. Self-recovering equalization and carrier tracking in two-dimensional data communications systems.IEEE Trans. Commun.,28(11):1867–1875, 1980.

[17] L.R. Litwin. Blind channel equalization.IEEE Potentials, 18(4):9–12,1999.

[18] J.R. Treichler and B.G. Agee. A new approach to multipath correctionof constant modulus signals.IEEE Trans. Acoust. Speech Sig. Process.,31(2):459–471, 1983.

[19] R.V. Hogg. More light on the kurtosis and related statistics. Jnl. Amer.Stat. Assc., 67(338):422–424, 1972.

10

[20] Z. Sidak, P.K. Sen, and J. Hajek.Theory of Rank Tests. Academic Press;2/e, 1999.

[21] J.P. Marques de Sa, L.M.A. Silva, J.M.F. Santos, and L.A. Alexandre.Minimum Error Entropy Classification. Springer-Verlag Berlin Heidel-berg, 2013.

[22] R.C. Geary. Testing for normality.Biometrika, 34(3/4):209–242, 1947.[23] A.T. Walden. Non-Gaussian reflectivity, entropy, and deconvolution.

Geophysics, 50(12):2862–2888, 1985.[24] M. Ooe and T.J. Ulrych. Minimum entropy deconvolution with an ex-

ponential transformation.Geophysical Prospecting, 27:458–473, 1979.[25] W. Gray. Variable Norm Deconvolution. PhD thesis, Stan. Univ., 1979.[26] E.H. Satorius and J.J. Mulligan. Minimum entropy deconvolution and

blind equalisation.IEE Elect. Lett., 28(16):1534–1535, 1992.[27] E.H. Satorius and J.J. Mulligan. An alternative methodology for blind

equalization.Dig. Sig. Process.: A Rev. Jnl., 3(3):199–209, 1993.[28] S. Abrar and A.K. Nandi. Adaptive minimum entropy equalization

algorithm. IEEE Commun. Lett., 14(10):966–968, 2010.[29] S. Abrar, A. Zerguine, and A.K. Nandi.Digital Communication, chapter

Adaptive Blind Channel Equalization, pages 93–118. C. Palanisamy(Ed.), InTech Publishers, Rijeka, Croatia, 2012.

[30] S. Haykin. Adaptive Filtering Theory. Prentice-Hall, 1996.[31] A.K. Nandi and S. Abrar. Adaptive blind equalization based on the

minimum entropy principle. In5th Intl. Conf. Comp. Dev. Commun.(CODEC), Dec. 2012.

[32] G. Yan and H.H. Fan. A Newton-like algorithm for complexvariableswith applications in blind equalization. IEEE Trans. Sig. Process.,48(2):553–556, 2000.

[33] S. Abrar and A. Zerguine. Enhancing the convergence speed of a multi-modulus blind equalization algorithm. InIEEE SCONEST, pages 41–44,2004.

[34] M. Rupp and A.H. Sayed. A time-domain feedback analysisoffiltered-error adaptive gradient algorithms.IEEE Trans. Sig. Process.,44(6):1428–1439, 1996.

[35] N.R. Yousef and A.H. Sayed. A feedback analysis of the trackingperformance of blind adaptive equalization algorithms.IEEE Conf.Decision and Control, 1:174–179, 1999.

[36] T.Y. Al-Naffouri and A.H. Sayed. Adaptive filters with error nonlinear-ities: mean-square analysis and optimum design.EURASIP Jnl. Appld.Sig. Process., 15:192–205, 2001.

[37] X. Wang and G. Feng. Performance analysis of RLS linearly constrainedconstant modulus algorithm for multiuser detection.Signal Processing,89:181–186, 2009.

[38] A.H. Sayed.Fundamentals of adaptive filtering. Wiley-Interscience andIEEE Press, 2003.

[39] A.H. Nuttall. Some integrals involving theQ-function. Naval Under-water Systems Center Tech. Rep. 4297, April 1972.

[40] O. Tanrikulu, A.G. Constantinides, and J.A. Chambers.New normalizedconstant modulus algorithms with relaxation.IEEE Sig. Process. Lett.,4(9):256–258, 1997.

[41] V. Shtrom and H.H. Fan. New class of zero-forcing cost functions inblind equalization.IEEE Trans. Sig. Process., 46(10):2674, 1998.

[42] F.R.P. Cavalcanti, A.L. Brandao, and J.M.T. Romano. The generalizedconstant modulus algorithm applied to multiuser space-time equalization.Proc. IEEE-SPAWC, 94-97, 1999.

[43] C. Yuxin, T. Le-Ngoc, B. Champagne, and X. Changjiang. Recursiveleast squares constant modulus algorithm for blind adaptive array.IEEETrans. Sig. Process., 52(5):1452–1456, May 2004.

[44] M. Lazaro, I. Santamaria, D. Erdogmus, K.E. Hild, C. Pantaleon, andJ.C. Principe. Stochastic blind equalization based on PDF fitting usingparzen estimator.IEEE Trans. Sig. Process., 53(2):696–704, 2005.

[45] M.D. Miranda, M.T.M. Silva, and V.H. Nascimento. Avoiding Diver-gence in the Shalvi–Weinstein Algorithm.IEEE Trans. Sig. Process.,56(11):5403–5413, 2008.

[46] G. Picchi and G. Prati. Blind equalization and carrier recovery usinga ‘stop-and-go’ decision-directed algorithm.IEEE Trans. Commun.,35(9):877–887, 1987.

[47] S. Abrar, A. Ali, A. Zerguine, and A.K. Nandi. Tracking performance oftwo constant modulus equalizers.IEEE Commun. Lett., 17(5):830–833,2013.

Anum Ali (S’12) received the B.S. degree in electrical engineering (withinstitute gold medal) from COMSATS Institute of Information technology,Islamabad, Pakistan, in 2011. He is currently pursuing the M.S. degree in

electrical engineering at King Fahd University of Petroleum and Minerals,Dhahran, Saudi Arabia. His research interests lie in signalprocessing andcommunications. Specifically, he is interested in the applications of compres-sive sensing for communications.

Shafayat Abrar (S’03, M’14) was born in Karachi, Pakistan, in 1972. Heholds a B.E., an M.S., and a Ph.D. degree, all in electrical engineering,respectively from NED University of Engineering and Technology, Karachi,Pakistan (1996), King Fahd University of Petroleum and Minerals, Dhahran,Saudi Arabia (2000), and University of Liverpool, Liverpool, UK (2010).He is currently serving as a faculty member at COMSATS Institute ofInformation Technology, Islamabad, Pakistan. He is a co-recipient of IEEECommunications Society Heinrich Hertz award for best CommunicationsLetters for the year 2012. His research interest is signal processing forcommunication systems.

Azzedine Zerguine (SM’03) received the B.Sc. degree from Case WesternReserve University, Cleveland, OH, in 1981, the M.Sc. degree from King FahdUniversity of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia, in1990, and the Ph.D. degree from Loughborough University, Loughborough,U.K., in 1996, all in electrical engineering. From 1981 to 1987, he was withdifferent Algerian state-owned companies. From 1987 to 1990, he was aResearch and Teaching Assistant in the Electrical Engineering Department atKFUPM. Currently, he is Professor in the Electrical Engineering Department,KFUPM, working in the areas of signal processing and communications.He is an Associate Editor of the EURASIP Journal on Advances in SignalProcessing. His research interests include signal processing for communica-tions, adaptive filtering, neural networks, multiuser detection, and interferencecancellation.

Asoke K. Nandi (F’10) received the degree of Ph.D. in High Energy Physicsfrom the University of Cambridge (Trinity College), Cambridge, UK, in 1979.He held several research positions in Rutherford Appleton Laboratory (UK),European Organisation for Nuclear Research (Switzerland), Department ofPhysics, Queen Mary College (London, UK) and Department of NuclearPhysics (Oxford, UK). In 1987, he joined the Imperial College, London, UK,as the Solartron Lecturer in the Signal Processing Section of the ElectricalEngineering Department. In 1991, he jointed the Signal Processing Divisionof the Electronic and Electrical Engineering Department inthe Universityof Strathclyde, Glasgow, UK, as a Senior Lecturer; subsequently, he wasappointed a Reader in 1995 and a Professor in 1998. In March 1999 he movedto the University of Liverpool, Liverpool, UK, to take up hisappointment tothe David Jardine Chair of Signal Processing in the Department of ElectricalEngineering and Electronics. Professor Nandi is a Finland DistinguishedProfessor. In 1983 he was a member of the UA1 team at CERN that discoveredthe three fundamental particles known asW+, W− andZ0, providing theevidence for the unification of the electromagnetic and weakforces, whichwas recognized by the Nobel Committee for Physics in 1984. Currently,he is the Head of the Signal Processing and Communications ResearchGroup with interests in the areas of signal processing, machine learning, andcommunications research. With his group he has been carrying out researchin developments and applications of machine learning, biomedical signalprocessing, bioinformatics, machine condition monitoring, communicationssignal processing, and blind source separation. He has authored or co-authoredover 400 technical publications; these include two books, entitled AutomaticModulation Recognition of Communications Signals(Boston, MA: KluwerAcademic, 1996) andBlind Estimation Using Higher-Order Statistics(Boston,MA: Kluwer Academic, 1999), and over 180 journal papers. Theh-index ofhis publications, according to the Web of Science, is 38. Professor Nandi wasawarded the Mounbatten Premium, Division Award of the Electronics andCommunications Division, of the Institution of ElectricalEngineers of theU.K. in 1998 and the Water Arbitration Prize of the Institution of MechanicalEngineers of the U.K. in 1999. He is a Fellow of the Institute of Electricaland Electronics Engineers (USA), the Cambridge Philosophical Society, theInstitution of Engineering and Technology, the Institute of Mathematics and itsapplications, the Institute of Physics, the Royal Society for Arts, the Institutionof Mechanical Engineers, and the British Computer Society.