Parameter Estimation Convolutional Encoder Observationsbiitcomm/research/references...implies that...

ISIT2007, Nice, France, June 24 - June 29, 2007

Parameter Estimation of a Convolutional Encoderfrom Noisy Observations

Janis Dingel and Joachim HagenauerInstitute for Communications Engineering (LNT), Munich University of Technology (TUM)

80290 Manchen, GermanyEmail: {janis.dingel, hagenauer}@tum.de

Abstract- We consider the problem of estimating the parame-ters of a convolutional encoder from noisy data observations, i.e.when encoded bits are received with errors. Reverse engineeringof a channel encoder has applications in cryptanalysis whenattacking communication systems and also in DNA sequenceanalysis, when looking for possible error correcting codes ingenomes. We present a new iterative, probabilistic algorithmbased on the Expectation Maximization (EM) algorithm. Weuse the concept of log-likelihood ratio (LLR) algebra which willgreatly simplify the derivation and interpretation of our finalalgorithm. We show results indicating the necessary data lengthand allowed channel error rate for reliable estimation.

I. INTRODUCTION

We address a problem strongly related to cryptanalysis anddata security. In a reverse engineering context, an observerwants to extract the transmitted information from a receiveddata stream without knowing all the parameters of the trans-mission. The observed signal may have been corrupted bynoise during transmission. Even without employing advancedprotocols from cryptology, modem communication systemsare hard to decipher if the parameters of the different elementsof the transmission chain are not or only partially known.Research on the reverse engineering of a channel encoder, asa special subproblem, has been conducted for communicationsystems [1], [2], [3], [4] and also for DNA sequences [5],looking for possible error correcting codes in the genetic code.Most of these approaches concentrate on linear block codes.In this paper, we derive a new algorithm for the estimationof encoder parameters of a convolutional code from a noisydata stream. This problem has been considered before byRice [3] and later by Filiol [2], where an algebraic estimationprocedure has been proposed. A candidate is recovered from asubsequence of bits that is hopefully unaffected by noise andthen tested for significance on the whole observed sequence.However, if no noisefree subsequence exists for which the pa-rameters can be algebraically recovered, the method will fail.A method for the reconstruction of punctured convolutionalcodes has also been presented by Filiol [6]. Here, we introducean iterative, probabilistic approach based on the ExpectationMaximization (EM) algorithm. The EM algorithm is a strongtool that has proven useful in many communications and signalprocessing problems such as blind channel estimation [7]and system identification [8], which are related to the oneconsidered here. In fact the problem is essentially the sameas estimating an unknown hidden markov model, which is

typically done with the Baum-Welch algorithm [9]. However,here we investigate systems where computations are carriedout in a finite field which requires different methods from thosedeveloped for traditional blind estimation scenarios. The latterimplies that an approach has to combine concepts from both,coding theory and blind signal processing. In a similar way,the EM algorithm has been applied to the synthesis of linearfeedback shift-registers from noisy sequences [10]. However,here we use the concept of log-likelihood ratio (LLR) algebra,introduced in [11], which will greatly simplify the derivationand interpretation of our final algorithm. We achieve this bytransforming the problem into the LLR domain, a natural stepwhen looking at it from a coding perspective.The paper is outlined as follows. In Section II, we willspecify and formalize the problem. Section III introduces theapplication of the EM algorithm and Section IV shows thetransformation to the LLR domain and the derivation of theestimation algorithm. Section V discusses simulation resultsbefore we conclude with Section VI.

II. PROBLEM STATEMENT

We assume that an information stream u[Uo, Ui, ... , ut, ... .I, ut C A is encoded with a convolutionalencoder of memory M. Here we will consider onlycodes defined over GF(2) with elements A {+1,-1,where +1 is the "null" element under the D addition. Wepresent our approach for codes of rate I and we excludeencoders with feedback. However, the algorithm can begeneralized to any rate k. In Figure 1, the situation forthe ith output of the convolutional encoder is shown.The parameters g(i) = [g(i), .. , g(i)] determine how the

t(i)Tit

*---------- /TNVt(i)() ( (i) t

Ut Uti Ut

Fig. 1. The ith output of the convolutional encoder, v( , is transmitted overan additive, memoryless channel resulting in the observation y(i)

1-4244-1429-6/07/$25.00 ©2007 IEEE 1776

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 11, 2009 at 09:28 from IEEE Xplore. Restrictions apply.


information symbol ut and the content of the memoryelements [ut 1, . . , Ut M], at time t, are mapped to theencoded symbol v<t of output i, i = 1,... ,n. Throughoutthis paper, we assume that the received symbol y(i) is

observed through a BSC, i.e. (i) = v (i)TiM. The stateof the encoder st will be defined as the concatenation ofthe current input symbol with the content of the memoryelements, i.e. st [sto, ... Istm] = [Ut, Ut 1, ,Ut M.Then we can write

My() EeStk gki t() 1

k=O

where multiplication and addition are in GF(2). Note that theith output only depends on the ith parameter subset g(') butthat the outputs are not independent as each output at time tis determined by the same encoder state st. Such an encoderis specified by the vector g = [g(l),... Ig(n)] An(M+e),which are the parameters we want to estimate from theobservation y = [y(l), . , y(n)], where y(i) = [YiI, I, yT)]and the length of y is nT. The likelihood for the observationp(y; g) depends on the deterministic parameter g. 1

III. MAXIMUM LIKELIHOOD ESTIMATION VIAEM-ITERATION

The Expectation Maximization (EM) Algorithm is a pow-erful concept for iteratively approaching the maximum likeli-hood solution in what are called incomplete-data problems [8].Let y be the observed data that depends on the unobservabledata x and a deterministic parameter 0 that we want toestimate. The so called missing data x is modeled as a randomvariable. Instead of obtaining the maximum likelihood for0 only from the observed data, the complete data (X, y)is incorporated in an iterative procedure [8]. With a fixedcurrent estimate on the parameter, 6[k], the expected valueQ(0, 0[k]) of log (p(x, y; 0)) is evaluated with respect to xand conditional on y in the so called E-Step. In the M-stepof the algorithm, Q(0, 0[k]) is maximized with respect to 0 toyield a new estimate 0[k+1] in iteration k + 1

0[k+1] =argmax {P(XjY;0[k])p(X Y; 0))} (2)0

x

Q(0,0[k])

The following theorem guarantees that in well behaved prob-lems the sequence of EM iterates converges to a stationarypoint which is a global maximum, in which case this yieldsthe unique ML estimate of 0.

Theorem 1: Let ly(0) = log (p(y;0)), then jY(0[k+1])I (6[k]) > Q(6[k+1];6[k]) _ Q(6[k];6[k]) with equality iff

'D (p(xy;0[k+l]) lp(Xy;0[k])) = 0, where 'D(. I) is theKullback-Leibler distance.

'We will denote the dependency of a pdf p(x) on a deterministic parameter0 by p(x; 0) whereas we will write p(xIO) when we mean to express theconditional pdf of x on a random variable 0.

The proof and further properties of the EM algorithm arediscussed for example in [8], [9].In our problem setting, we identify y as the observed and s asthe missing data. The unobserved state at time t, st, is mappedto the observation Yt = [41(l) (n). The joint distributionof (s, y) depends on the parameter g of the convolutionalencoder (see [9], [10])

p(s, y; g) = p(so)Hp(st+1I St)P(Yt St; g)t

The transition to the next state does not depend on g andthe same holds for the distribution of the initial state p(so).Hence, applied to our problem, the maximization in (2) canbe written as

9[k+1] = arg max{ p(st IY; g[k] ) log(p(yt Ist; g))} (3)9 t, St

Note, given the state st, the n outputs y(i) of the convolu-tional transducer are independent from each other and onlyinfluenced by the respective subset of parameters g(i):

nlog (p(yt Ist; g)) = log(p(y4() Ist; g(l)))

i=l(4)

So far, we considered g as an element from a discreteparameter space. However, the EM based approach requiresthe parameter space to be continuous. In [10], this hasbeen achieved by introducing a probabilistic model. Here,we will transform the parameters into the LLR space, whichwill greatly simplify the calculations applying log-likelihoodalgebra [1 1].

IV. TRANSFORMATION INTO THE LOG-LIKELIHOODDOMAIN

A. Application of Log Likelihood Ratios

Instead of regarding the encoder parameters as discretevalues, we will work with probabilities. In the context ofiterative decoding it has proven useful to work with log-likelihood ratios and to apply log-likelihood algebra [11]. Wewill see that it is also a natural choice for our problem. Ingeneral, a log-likelihood ratio L(x) for the binary randomvariable x with pdf p(x) = [p(x = +1), p(X = -1)] isdenoted as L(x) = log (X A+i) and the soft bit x, whichis the expected (average) value of x, can be shown to bex = E{x} = tanh(). Throughout this paper, we willabbreviate the log-likelihood ratio and the soft bit of theparameter g(i) as

( o) (plg ) +1) and g- ()\p(g$) )

tanh (2) (5)

Furthermore, we denote L = [L(1), L(), .., L(n)] =

[L*()I.. L(n)] as our parameter vector. In our EM approach

1777



we now consider Q(L, L[k]) and the objective function in (3)is transformed into

E7 p(st Iy; L[k] ) log(p(yt Ist; L)), (6)t, St

which is now maximized for L C Rn(M+1), i.e. in a continu-ous space. In order to find an analytical solution, we relax thestrict maximization of (6) and require only Q(L[k+l], L[k]) >Q(L[k], L[k]), which, according to Theorem 1, suffices toincrease the likelihood in iteration step k + 1. This approach iscommonly referred to as the generalized EM algorithm [9]. Anincrease of Q(., -) is achieved by a steepest ascent approach,i.e. the maximization is replaced by

L('),[k+l] =L(i),[k] + ,OQ(L, L[k]) 7m m OL(2' (7m

where ,t is a suitable chosen step size. The gradient isevaluated at L() [k]As we mentioned before, most of the terms in Q(L, L[k]) donot depend on L and as a result, only equation (6) has tobe considered for the derivative of the Q-function. With theindependence property from (4), this yields

DQ(L, L[k])aLm

: Zp(st Iy; L[ k]) xt, St

log (p(1) |st; L()))mL' (8)

We identify p(st ly; L[k]) as the a posteriori probabilitydistribution for the state of the encoder at time t, givenall observations and the current guess of the LLRs. Thisdistribution is of great interest in coding and signal processingand a number of computationally efficient forward-backwardrecursions have been developed to calculate this distribution,(an overview is provided e.g. in [9]). For the evaluation of (8),we still have to find an analytical expression for the derivativeof the conditional log-likelihood of ft('). At this point, thechoice of log-likelihood ratios for the parameters turns outto be an elegant solution. Remember that, assuming a BSCchannel model, y(') is a modulo 2 sum (Eq. (1)). We makeuse of the boxplus operator ff, a result from LLR algebra [11].In the LLR domain, (1) is transformed to

M

L(yt) *st) D L (i) ffl L(T)kOk=0Stk=1

2tanh-1 ( I tanh( k ) tanh(L(2)) (9)k=o

'stk =_I

For the BSC channel model with transition probability E weset L(0j) =log ( 1 £ ) . Note that L(') st) inherently dependson the parameter L(') which shall not explicitly appear inthe notation for the sake of simplicity. Introducing LLRsallows for a convenient way to process soft channel values

and different expressions of L(y4()lst) for the Gaussian orthe multiplicative fading channel can be derived [11]. This isan obvious advantage of our probabilistic approach comparedto the algebraic solution presented in [2] and is expected tolead to better results alike decoding methods that process soft-values outperform their algebraic counterparts [11]. In thefollowing, we concentrate on the BSC channel model.

B. Analytical Derivation of the GradientIn order to derive an analytic solution for (8), we have to

evaluate an expression of the form

aDOm log (p(y; 6)),~ (10)

a derivative of a log-likelihood that depends on a parametervector 0 with respect to a single parameter 0m.

Lemma 1:Dlog(p(y=±1;0))

Om

Above statement can be easily verified substitutinge±L(Y;0)/2

P(Y = ±1; 0) = eL(y;O)/2 + e-L(y;O)/2 ' (12)

in (10) and carrying out some simple calculations. In the log-domain, Eq. (10) now reads as

log (p(y = ±1; 0)) =

±eTL(y;O)/2 aL(y; 0)eL(y;O)/2 + e-L(y;O)/2 0om (13)

In our problem, we identify L(y; 0) with L(y4(') Ist) that is aboxplus-sum as shown in equation (9) the derivative of whichcan be given as

aL(y(i) st)aLt?

L f L(i) f L(T0)k,Stk=-

rk,St., 1=gI E{ (g$_ g(iQ) 2}

1-(Ti I g(i))2 i2

,stk =1

(14)

(15)

Eqn. 15 is obtained by using the derivative of the hyperbolictangent and its inverse. Equation (15) depends only on soft bitsand the variance of gT) and we used the relation E{(gm)-gm$)2} 1 (g$)2 = 2L(). The numerator (divided

by gm$) measures the reliability of all the other parametersabout the mth parameter. The denominator is a measure ofuncertainty when the mth parameter is included. This ratiois weighted with the variance of (') which is between zeroand 1. The higher the uncertainty about g(4) the higher itsvariance. For the estimation update in (7), the gradient isevaluated at L()[ Our algorithm starts with an initializationof L[O]. Setting the LLRs of the parameters close to zero

1778

a±p(y = T- 1; 0) L (y; 0). (I 1)oom



indicates maximum uncertainty. The choice of a good initialestimate is an important step in EM techniques as only localconvergence is guaranteed. However, we will only considerthe case where no a priori knowledge on the encoder structureis available. One EM iteration is performed by multiplyingthe terms obtained from the considerations in (13), (15) withthe probabilities p(st y; L), summing up over all possiblestates and times. The gradient (8) is then used to update theparameter estimates according to (7), where a suitable step size,t has to be chosen. Iterations are performed until a certain stopcondition is met.

V. SIMULATION RESULTS

The visualization of the iterative estimation of the pa-rameters of a rate 4 nonsystematic encoder with M = 4([53, 51, 63, 73] in octal form) is shown in Figure 2. Theencoded stream was sent over the memoryless BSC channelwith transition probability U= 0.1. It was assumed that thepartitioning of the stream yielding y(l), . . ., y(4) is known atthe receiver. 1000 noisy bits were observed per output, i.e. weexpect 100 wrong bits per output. Full maximum likelihoodestimation would mean to evaluate and compare 220 likelihoodvalues. In Figure 2(a) and 2(b) we see the progression of theLLRs of the parameters g(l) and g(4) over 20 EM iterations.On the y-axis, we show the LLRs multiplied with the truevalues, i.e. Lm) = L(')g( . Positive values indicate the correctestimation of a parameter, whereas at iterations where anegative value is shown, a hard decision would lead to a falseestimate. We can observe that several parameters start to movein the wrong direction, reaching even high reliability values.However, after a certain number of iterations, they start tomove back in the correct direction. Around iteration 19 theLLRs predict the correct encoder. As the absolute values of theLLRs reflect the reliability about an estimate, we stop iteratingwhen there are no unreliable values left in L[k]. Figure 2(c)shows the log-likelihood log (p(y; L[k+l])). noise levels incase of systematic convolutional encoding. The streams wereencoded with a rate 4 memory M = 4 optimal distanceprofile code with generator polynomial [g(2), ... , 9)][56, 62, 72] [12]. Correct convergence means that the L in-dicate high reliability (here defined as IL(') > 2 Vm, i) andthe true g is obtained after hard decision. Figure 3(b) showsthe mean number of iterations until reliablility is achievedand the maximum number of iterations performed was setto 50. We observe high rates of convergence even at highnoise levels for a relatively small number of observed codedbits. For example with 400 coded bits at a noise level E <0.05 we were always able to recover the correct encoder.However, in an attacking scenario we think that it can beassumed that an attacker has long data streams available andit is not expected that the encoding scheme changes everyfew bits. For the same reason, we ignore the computationalcomplexity which is not expected to be of great relevance forthe attacker. Considering nonsystematic encoders, we observedpoor convergence rates with the setup presented. In this case,the success heavily depends on the start value and the gradient

ascent approach often causes the algorithm to run in a localmaximum immediately. Global maximization techniques suchas simulated annealing or stochastic optimization [13] areexpected to significantly improve the algorithm in this case andwill be investigated in the future. Even though the restrictionto systematic encoders is yet a disadvantage compared tothe algebraic method presented in [2], we believe that theprobabilistic approach has several advantages:

1) a priori information about the parameters is easily in-corporated by adapting the initial ratios. In this way,knowledge on the design rules of convolutional encoders(see for example [12]) could help to choose good startingvalues. For instance every good convolutional encoderwill have L(i) = -oc and LM() = -oc. In a similarfashion, constraints on the state sequence imposed bytail-biting codes could be incorporated in our algorithm.

2) The probabilistic approach can be extended to AWGNor fading channels and thus will be able to process thesoft values coming from the channel.

3) With the results from [10], it will be possible to considerconvolutional encoders with feedback, a situation whichhas neither been addressed in [2] nor, to our knowledge,in other publications.

4) The algebraic method requires information about thenoise level for statistical hypothesis testing. Our EMapproach allows for the co-estimation of the noise levelas an additional parameter.

VI. CONCLUSION

We considered the problem of estimating the parametersof a convolutional encoder given only the noisy observations.Algebraic methods to tackle this problem have been consideredbefore [3], [2]. Here, we presented an iterative, probabilisticapproach based on the EM algorithm. Our approach is similarto the one presented by Moon for the synthesis of linearfeedback shift registers in [10]. However, we used LLRsand applied log-likelihood ratio algebra, which better suitsthe underlying coding nature of this problem. We showedthat this leads to simple derivations of the algorithm andallows for the processing of soft values. Furthermore, our LLRbased approach provides a reliability measure on the obtainedestimates. Alternatively, soft bits could be used to model theparameters. We ran our method on distorted data streamsand showed that high rates of correct reconstruction can beachieved even at high noise levels assuming a systematicencoder. The algorithm shows poor convergence rates for non-systematic encoders as the gradient ascent approach runs thealgorithm into local maxima. Global maximization techniquessuch as simulated annealing or stochastic optimization willbe investigated in the future to overcome this problem. Eventhough our method is yet restricted to systematic codes, webelieve that it could have several advantages compared to thealgebraic approach as discussed at the end of Section V. Wepresented our approach for codes of rate n and excluded thepossibility of feedback from our considerations. However, inprinciple it is possible to apply our method to codes of any rate

1779



(l), [k] L (1), [k]lIl L(l),[k]k 4

2

2.5 L L4),k2 2

1-1~1.5 (4), [k

0

-0.5-

5 10 15Iteration k

(a) LLR estimates on g(l) vs. iterations. (b) LLR estimates on g(4) vs. iterations. (c) Progression of log(p(y; L[k+l])) vs.iterations.

Fig. 2. Sample of a successful estimation after 20 iterations of a rate 1, M = 4, nonsystematic encoder from 4000 observed coded bits and E = 0.1.4,

100

0)= 80-

0 a).a) (D) 60 -

0)>CZ 0

O,40 -

02000

16001200

800

Number of observedcoded bits

40040 0

0.05

0.01

Noise level

(a) Percentage of frames with correct convergence.

50

=o40 -

CZ,_

0)

01

1200800

Number of observed 400coded bits

(b) Mean number of required iterations until convergence.

Fig. 3. Percentage of correct convergence and mean number of required iterations for different numbers of observed coded bits and noise levels in case ofsystematic convolutional encoding.

k and the extension to the cases with feedback or puncturingn

should be possible with the results from [10], [6].

ACKNOWLEDGMENT

The authors would like to thank the Deutsche Forschungs-gemeinschaft (DFG) and the Bund der Freunde der TUM forsupporting their work.

REFERENCES

[1] M. Cluzeau, "Block code reconstruction using iterative decoding tech-niques," in Proc. IEEE International Symposium on Information Theory,ISIT06, Seattle, July 2006.

[2] E. Filiol, "Reconstruction of convolutional encoders over GF(q),"Lecture Notes In Computer Science, vol. 1355, pp. 101-109, 1997.

[3] B. Rice, "Determining the parameters of a rate 1 convolutional codeover GF(q)," in Proc. Third International Conference on Finite Fieldsand Applications, Glasgow, 1995.

[4] A. Valembois, "Detection and recognition of a binary linear code,"Discrete Applied Mathematics, vol. 111, pp. 199-218, 2001.

[5] G. L. Rosen, "Examining coding structure and redundancy in DNA,"IEEE Engineering in Medicine and Biology, vol. 25, no. 1, pp. 62-68,January 2006.

[6] E. Filiol, "Reconstruction of punctured convolutional encoders," inProc. International Symposium on Information Theory and Applications(ISITA'00), Hawai, November 2000.

[7] G. K. Kaleh and R. Vallet, "Joint parameter estimation and symboldetection for linear or nonlinear unknown channels," IEEE Transactionson Communications, vol. 42, no. 7, pp. 2406-2413, July 1994.

[8] T. K. Moon, "The expectation-maximization algorithm," IEEE SignalProcessing Magazine, pp. 47-60, November 1996.

[9] Y Ephraim and N. Merhav, "Hidden markov processes," IEEE Transac-tions on Information Theory, vol. 48, no. 6, pp. 1518-1569, June 2002.

[10] T. K. Moon, "Maximum-likelihood binary shift-register synthesis fromnoisy observations," IEEE Transactions on Information Theory, vol. 48,no. 7, pp. 2096-2104, July 2002.

[11] J. Hagenauer, E. Offer, and L. Papke, "Iterative decoding of binary blockand convolutional codes," IEEE Transactions on Information Theory,vol. 42, no. 2, pp. 429-445, March 1996.

[12] R. Johannesson and K. S. Zigangirov, Fundamentals of ConvolutionalCoding, ser. Series on Digital & Mobile Communication, J. B. Anderson,Ed. IEEE Press, 1999.

[13] S. Schaffler and J. Hagenauer, "Soft decision MAP decoding of binarylinear block codes via global optimization," in Proc IEEE InternationalSymposium on Information Theory, Ulm, Germany, June 1997.

1780

3

2.5

2- L(l),[kx

1r0.5 /

-1.55 10

Iteration k15 20 20

0.0

40 0

0100L5

Noise Level

T

/

;f .


Parameter Estimation Convolutional Encoder Observationsbiitcomm/research/references...implies that...

Documents

Transcript of Parameter Estimation Convolutional Encoder Observationsbiitcomm/research/references...implies that...