wemmu001

FLOATING-POINT-BASED HARDWARE ACCELATOR OF A BEAMPHASE-MAGNITUDE DETECTOR AND FILTER FOR A BEAM PHASE

CONTROL SYSTEM IN A HEAVY-ION SYNCHROTRON APPLICATION∗Faizal Arya Samman, Pongyupinpanich Surapong, Christopher Spies and Manfred GlesnerFG Mikroelektronische Systeme, Technische Universitat Darmstadt, Darmstadt, Germany

Abstract

A hardware implementation of an adaptive phase andmagnitude detector and filter of a beam-phase control sys-tem in a heavy ion synchrotron application is presented inthis paper. The main components of the hardware are adap-tive LMS filters and phase and magnitude detectors. Thephase detectors are implemented by using a CORDIC algo-rithm based on 32-bit binary floating-point arithmetic dataformats. The floating-point-based hardware is designed toimprove the precision of the past hardware implementa-tion that were based on fixed-point arithmetics. The hard-ware of the detector and the adaptive LMS filter have beenimplemented on a programmable logic device (FPGA) forhardware acceleration purpose. The ideal Matlab/Simulinkmodel of the hardware and the VHDL model of the adap-tive LMS filter and the phase and magnitude detector arecompared. The comparison result shows that the outputsignal of the floating-point based adaptive FIR filter as wellas the phase and magnitude detector agree with the ex-pected output signal of the ideal Matlab/Simulink model.

INTRODUCTION

In a synchrotron, different modes of coherent longitudi-nal beam oscillations may occur due to an initial mismatchor to non-linearities such as wake fields. These oscilla-tions are characterized by their mode number m and n [4]and take place at the characteristic synchrotron frequency,which depends on the system state (more precisely, on themagnetic flux derivative, accelerating voltage, and particleenergy). In order to eliminate undesired dipole oscillations,a beam phase control system [5] has been devised, whichwas initially designed to deal with in-phase dipole oscil-lations (m = 1, n = 0) only [5]. The addition of ampli-tude detectors is intended to make it suitable for dampinghigher-order modes [4, 6].

This paper presents a new floating-point based architec-ture designed to improve both the precision and processingspeed of the digital controller over previous implementa-tions [2, 11]. The floating-point arithmetic units are usedto overcome issues with explosive divergence and stallingeffects caused by fixed-point implementations [8]. Suchproblems may arise due to input or output signal satura-tion caused by changes in the dynamic signal range at run-time. Using a floating-point representation eliminates theneed for input and output scaling that would have to be

∗The work is supported by German Federal Ministry of Education andResearch in the frame of Project FAIR (Facility for Antiproton and IonResearch), Grant Number 06DA9028I.

performed in a fixed-point implementation to accomodatethese changes. A floating-point implementation can also beadapted to different dynamic range requirements of othercontrol applications and is therefore more suitable for anintegrated circuit implementation (ASIC).

In the following section, the digital signal processingchain of the beam phase control is presented. Its core com-ponents are phase and magnitude detectors and adaptive fil-ter blocks, which are discussed in the subsequent sections.

BEAM PHASE CONTROL

AD

AD

BPi

GVi

DICCOR−

DICCOR−

PB

Filter

Band−pass

fierAmpli−

MB

reff

gapuAP

Filter

Band−pass Inte−

gratorD

ABPM

Pre−Analog

processing

ΔΦ

Figure 1: Block Diagram of the Beam Phase Signal Pro-cessing.

The signal processing pipeline implemented in the DSPsystem is shown in Fig. 1. It performs the following steps:

1. Some analog preprocessing is performed on both thegap voltage and the beam position signal [3]. Notethat the beam position signal is quite noisy and has ahigh level of uncertainty since the bunch may have anarbitrary shape.

2. Four successive samples are used to form the in-phaseand quadrature components of both signals.

3. The phase of both signals is computed using Alg. 3.

4. The phase difference is fed into a programmable FIRfilter in order to eliminate noise and other disturbancesin the phase difference. The filter is tuned to a fre-quency slightly above [5] the synchrotron frequencyand has to be re-tuned continuously since this fre-quency changes.

5. The output of the filter is fed into a variable-gain con-troller whose gain also depends on the characteristicfrequency. This controller yields a frequency correc-tion which is then applied to the cavity in order to in-tentionally mistune it, thereby pulling the bunch backto its desired position.

BPM is the signal from the beam position monitor. ugap

is the voltage across the gap of the reference cavity. fref isthe reference frequency (see [3]). Future extensions [6] are

Proceedings of ICALEPCS2011, Grenoble, France WEMMU001

Hardware 683 Cop

yrig

htc ○

2011

byth

ere

spec

tive

auth

ors—

ccC

reat

ive

Com

mon

sAtt

ribu

tion

3.0

(CC

BY

3.0)

dashed. In order to enable real-time processing, the phasedetector and filtering blocks have been implemented on anFPGA.

PHASE-MAGNITUDE COMPUTATION

Architecture for Phase-Magnitude Computing

The phase and magnitude of the beam signal are de-tected by using a CORDIC (COordinate Rotation DIgi-tal Computer) algorithm. The drawback of a conventionalCORDIC algorithm is its high computation latency [7]. Toaccelerate the computation, an increment of rotation angleon each iteration becomes one solution. A double-rotationCORDIC algorithm is proposed as depicted in Alg. 1. Inthe double-rotation CORDIC algorithm, the rotation angleis extended two times (2φ) of the conventional CORDIC.

Algorithm 1 MICRO-DRCORDIC-VMInput: : Xi , Yi , Zi , T , δiOutput: : X , Y , Z, δi+1

X = Xi − 2−2·i−2 · Xi − δi · 2−i · Yi

Y = Yi − 2−2·i−2 · Yi + δi · 2−i · XiZ = Zi − δi · 2 · T

The double-rotation CORDIC algorithm using the re-dundant method was proposed by Takagi et al. [13]. Withthe redundant method, the rotation direction of the double-rotation CORDIC algorithm is δi ∈ {−1, 0, 1}. However,using the redundant method, the scaling factor is not con-stant and has to be calculated at run-time. To overcomethis problem, we propose the double-rotation CORDIC al-gorithm as presented in Alg. 2 using the non-redundantmethod and with a constant scaling factor. Thereby, the on-line computation problem [1], as well as the scaling prob-lem are also eliminated.

⎡⎢⎣

XN

YN

ZN

⎤⎥⎦ =

⎡⎢⎢⎢⎣

Kdr

√X2

0 + Y 20

0

arctanY0

X0

⎤⎥⎥⎥⎦ (1)

Algorithm 2 DRCORDIC-VMInput: : Xin , Yin , N , LTAN , K−1

drOutput: : Xout , Zout

X0=Xin, Y0=Yin , Zin=0, δ0=1 {Initialization}if Y0 ≥ 0 then

δ0=1;else

δ0=-1;end iffor i = 1toN do

[X , Y , Z]=MICRO-DRCORDIC-VM(Xo, Yo , Zo , TLUT (i), δi)if Y ≥ 0 then

δi+1=1;else

δi+1=-1;end ifX0 = XY0 = YZ0 = Z

end forXout = X · K−1

dr; Zout=Z

The constant scaling factor for the double-rotationCORDIC is given by Kdr =

∏Ni=1(1 + 2−2i−2) which

approximately equals 1.084727 (K−1dr = 0.9219),

and the maximum and minimum rotation angle∑Ni=1 2tan

−1(2−i−1) is in the range of [-0.9885,+0.9885].Alg. 3 shows the computation of the beam phase and

magnitude based on online measurements of the gain volt-ages GVi and beam position BPi signals [2], which isillustrated in Fig. 1. The beam phase difference signal(ΔPhase) is obtained from the difference between thephase detection signals PA and PB of the gap voltages GVi

and beam positions BPi signals, respectively.

Algorithm 3 BEAM Phase and MagnitudeInput: : GV1,GV2,GV3,GV4,BP1,BP2,BP3,BP4

Output: : M , ΔΦQ1A = GV1; I1A = GV2; Q2A = GV3; I2A = GV4

Q1B = BP1; I1B = BP2; Q2B = BP3; I2B = BP4

ΔXA = Q1A − Q2A

ΔYA = I1A − I2A[MA , PA]=DRCORDIC-VM(ΔXA, ΔYA, N , TLUT ,K−1

dr)

ΔXB = Q1B − Q2B

ΔYB = I1B − I2B[MB , PB ]=DRCORDIC-VM(ΔXB , ΔYB , N , TLUT ,K−1

dr)

M = MB

ΔΦ=PA − PB

The architecture of the 3-stage double-rotation CORDICalgorithm for the beam phase-magnitude detection is illus-trated in Fig. 2. The 3-stage pipeline architecture is de-signed to improve the performance of the beam-phase de-tector.

SR

Add/Sub

xo

SR

xi

Sub

m

yi

Sub

SR

Add

yo

inv

Sub

zi

zo

M1M2SR

M2 M1

dinv d

tani

inv d

Dpath3

Dpath2

Dpath1

Figure 2: The micro-rotation architecture of the 3-stagedouble-rotation CORDIC algorithm, where M1 = −i,M2 = −2i− 2 and d = δi.

10 11 12 13 14 15 160

2

4

6x 10

−5

Iterations

MA

PE

of m

ag.% Double−rotation

Conventional

10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

Iterations

MA

PE

of p

hase

% Double−rotationConventional

Figure 3: MAPE comparison of the conventional CORDICand the double-rotation CORDIC.

Figure 3 depicts the performance comparison betweenthe conventional and the double-rotation CORDIC algo-rithm by using the MAPE metric. MAPE stands for

WEMMU001 Proceedings of ICALEPCS2011, Grenoble, France

684Cop

yrig

htc ○

2011

byth

ere

spec

tive

auth

ors—

ccC

reat

ive

Com

mon

sAtt

ribu

tion

3.0

(CC

BY

3.0)

Hardware

Mean Absolute Percentage Error and is a statistical mea-surement of the CORIDC output accuracy formulated by1n

∑ns=1| cs−ms

cs|, where n is the number of measurements,

cs is the CORDIC output and ms is the ideal output.The figure shows that the double-rotation technique givesbetter accuracy with the same number of iterations. Inother words, for the same MAPE value, the double-rotationCORDIC requires less interations.

Simulation and Synthesis Results

The simulation result of the double-rotation CORDIC al-gorithm for the beam phase signal detection is depicted inFig. 4. The figure presents the beam input signal, the ref-erence signal and the the output of the phase detector. Theoutput of the beam phase detector (beam phase signal) isinterfered with noisy signal. Therefore, the noise signalshould be filtered to acquire the noise-free beam phase sig-nal. The filtering issue is described in the next section.

Figure 4: Beam input, reference and beam phase signals.

Table 1 shows the synthesis results of the micro-rotationof the double rotation CORDIC algorithm onto a XilinxFPGA vlx110t-2ff1738 for different implementations (interms of the number of pipeline stages). Logic area com-parisons between the conventional (CV) and the double-rotaion (DR) CORDIC architectures are presented in thetable.

Table 1: Synthesis Results on Xilinx FPGA vlx110t

CORDICUtilization Min. Delay Max. Freq.

SReg SLUT LUT-FF (ns) (MHz)1-state CV 252 0 0 4.68 213.6682-state CV 258 343 231 3.05 327.8363-state CV 273 355 239 2.94 340.5131-state DR 663 0 0 7.713 129.6532-state DR 258 698 230 5.045 198.2103-state DR 746 1228 405 4.107 243.496

SReg=Slice Register, SLUT=Slice LUT

PHASE DIFFERENCE SIGNALFILTERING

Filter Architecture

The noise in the phase difference of the beam signalshown in Alg. 3 is filtered by using adaptive FIR filter inwhich the filter parameters are adaptively tuned by using aleast-mean-square (LMS) algorithm. The hardware archi-tecture of the adaptive FIR filter is designed by using thecombination of serial and parallel structures. This combi-nation is made to fulfill the time constraint of the applica-tion and the area constraint of the FPGA device. Figure 5presents the serial-parallel architecture of the adaptive FIRfilter with 64 taps. Two registers are controlled by a controlunit and are used to store the sampled input signal x(k− j)and the filter parameters wj . The filter output computationis implemented in serial, while the filter parameter adapta-tion algorithm is implemented in parallel.

Floating-point adder, multiplier and multiplier-accumulator units were used in the filter architecture.The operands are IEEE single-precision binary floating-point numbers [12], i. e. they have 1 sign bit, 8 exponentbits and 23 mantissa bits. The floating-point adderarchitecture consists of three stages: exponent differ-ence and alignment, adding/subtracting combined withleading-on-detection and normalization stage.

The floating-point multiplier architecture also consists ofthree stages: exponent adding and mantissa multiplication,mantissa shifting and normalization stages. In the normal-ization stages, the arithmetic computing results are normal-ized into the standard 32-bit (single precision) binary float-ing point format. The normalization stages include also thezero and infinity detections of the data values according tothe IEEE standard [12]. This approach is the same as theone used in other projects documented in available litera-ture [8].

Mult

Add

ww

63

64

ww

61

62

Subs

w1www

2

3

4

.........

.........

.........

.........wj

Ada

ptiv

e L

MS

Alg

orith

m

j j

xk−jwj

e(k)

UnitControl

y(k)

x(k−1)x(k−2)x(k−3)x(k−4)

x(k−j)

x(k−61)x(k−62)x(k−63)x(k−64)

FPM

AC

0

Controller block

Adaptive Algorithm block Filter blocky(k) d(k)

x(k)

Figure 5: Adaptive FIR Filter architecture with 64 taps.

Simulation and Synthesis Results

Figure 6 shows the input-output signals of the adaptivefilter, the desired output signal and the error signal betweenthe filter output and the desired output signals. The mainchallenge of the adaptive FIR filter hardware in practice

Proceedings of ICALEPCS2011, Grenoble, France WEMMU001

Hardware 685 Cop

yrig

htc ○

2011

byth

ere

spec

tive

auth

ors—

ccC

reat

ive

Com

mon

sAtt

ribu

tion

3.0

(CC

BY

3.0)

Figure 6: Filter input-output signals.

Table 2: Synthesis Result on FPGA Virtex 5 Device

Utilization % of Total

Number of slice registers 38,549 31%

Number of slice LUTs 75,485 61%

Minimum Delay 7.462 ns

Maximum Frequency 134.018MHz)

is the generation of the expected filter output. At present,the expected signal is calculated off-line. In the simula-tion results shown above, the signal was captured from asystem model [9, 10]. We are working on a hardware im-plementation of a simplified system model that can serveas a reference in the future.

The adaptive FIR filter with LMS algorithm has beenimplemented on the Virtex5 FPGA device (device type5vfx200tff1738-2). The device has a total number of 200klogic gates. The synthesis result is shown in Table 2.

CONCLUSIONS AND OUTLOOKSThe beam phase-magnitude hardware detector based

on the double-rotation CORDIC algorithm has been pre-sented. Compared to the conventional CORDIC algorithm,the double-rotation CORDIC algorithm provides better ac-curacy. The adaptive FIR filter architecture that combineserial and parallel computation modes is also presented.The adaptive LMS algorithm is driven by the error signalbetween the filter output and the desired signal. For now,the desired signal is calculated off-line. In the future, weplan to include a simplified system model in order to sup-

ply the adaptive FIR filter with an expected signal that isnot known a priori.

ACKNOWLEDGEMENTSThe author would like to thank German Federal Min-

istry of Education and Research (Bundesministerium furBildung and Forschung) for supporting the work in theframework of Project FAIR (Facility for Antiproton and IonResearch), Grant Number 06DA9028I.

REFERENCES[1] M. D. Ercegovac, T. Lang “Redundant and on-line

CORDIC: Application to matrix triangularization andSVD”. IEEE Transactions on Computers, pp. 725–740,1990.

[2] A. Guntoro, P. Zipf, O. Soffke, H. Klingbeil, M. Kumm,and M. Glesner, “Implementation of Realtime and High-speed Phase Detector on FPGA,” in Proc. Int’l Workshopon Applied Reconfigurable Computing, 2006, pp. 1–11.

[3] H. Klingbeil. “A Fast DSP-Based Phase-Detector forClosed-Loop RF Control in Synchrotrons”. IEEE Trans.Instrumentation and Measurement, 54(3):1209–1213, June2005.

[4] H. Klingbeil, D. Lens, M. Mehler, B. Zipfel. “Model-ing Longitudinal Oscillations of Bunched Beams in Syn-chrotrons”. http://www.arxiv.org/abs/1011.3957.

[5] H. Klingbeil, B. Zipfel, M. Kumm, and P. Moritz. “A DigitalBeam-Phase Control System for Heavy-Ion Synchrotrons”.IEEE Trans. Nuclear Science, 54(6):2604–2610, Dec. 2007.

[6] D. Lens, H. Klingbeil, T. Gußner, A. Popescu, K. Groß.“Damping of Longitudinal Modes in Heavy-Ion Syn-chrotrons by RF-Feedback, ” in Proc. IEEE Conf. on Con-trol Applications, 2010, pp. 1737–1742.

[7] P.K. Meher, J. Valls, Juang Tso-Bing, K. Sridharan, K. Ma-haratna “50 Years of CORDIC: Algorithms, Architectures,and Applications”. IEEE Transactions, Circuits and Sys-tems, pp. 1893–1907, 2009.

[8] B. W. Robinson, D. Hernandez-Garduno, and M. Saquib.“Fixed and Floating-Point Implementations of Linear Adap-tive Techniques for Predicting Physiological Hand Tremorin Microsurgery”. IEEE Journal of Selected Topics in Sig-nal Processing, 4(3):659–667, July 2010.

[9] C. Spies, P. Zipf, M. Glesner, and H. Klingbeil, “Band-width Requirement Determination for a Digitally ControlledCavity Synchronisation in a Heavy Ion Synchrotron UsingPtolemy II,” in Proc. Int’l Workshop on Rapid System Pro-totyping, 2008, pp. 196–202.

[10] C. Spies, “Model-Based Feasibility Analysis of DigitalBeam Phase Control in a Heavy-Ion Synchrotron,” in Proc.Design Automation and Test in Europe, PhD Forum, 2011.

[11] P. Surapong, M. Glesner, and H. Klingbeil, “Implementationof realtime pipeline-folding 64-tap filter on FPGA,” in PhD-Forum: PhD Research in Microelectronics and Electronics(PRIME), 2010, pp. 1–4.

[12] Standards Committee of the IEEE Computer Society. “IEEEStandard for Binary Floating-Point Arithmetic”. AN-SI/IEEE Std. 754-1985.

[13] N. Takagi, T. Asada, S. Yajima “Redundant CORDIC meth-ods with a constant scale factor for sine and cosine com-putation”. IEEE Transactions on Computers pp. 989–995,1991.

WEMMU001 Proceedings of ICALEPCS2011, Grenoble, France

686Cop

yrig

htc ○

2011

byth

ere

spec

tive

auth

ors—

ccC

reat

ive

Com

mon

sAtt

ribu

tion

3.0

(CC

BY

3.0)

Hardware

wemmu001

Documents

Transcript of wemmu001