64_FFT_final

download 64_FFT_final

of 8

Transcript of 64_FFT_final

  • 7/30/2019 64_FFT_final

    1/8

    1

    A 0.18m VLSI Technology Based64 Points Fast Fourier Transform Kernel

    Duo Ding

    Joonsoo Lee

    Yousof MortazaviReport for final project of VLSI-I Spring 2007

    Abstract-In this report, we present a thorough VLSIimplementation of a 64-point FFT/IFFT IP core with signedfixed-point 16-bit word length accuracy, primarily for IEEE802.11a wireless Local Area Network applications. Such a kernelcould also be integrated into a vast range of modern ImagingRadar Systems and Real-time Signal Processing Systems. Onalgorithm level, our 64-point FFT is accomplished bydecomposing itself into a 2-D structure of 8-point FFTs.Compared with traditional radix-2 64-point FFT, such amechanism greatly reduces the work load of the complexmultiplier unit and results in much better system performancewith respect to processing speed and power consumptions, etc.Complex multiplication operations are realized by shifters andadders with double precision and no RAM cell is required forcoefficient storage. Our proposed FFT kernel is based on 0.18 mCMOS technology, simulated in Synopsys VCS environment andis compiled and synthesized in design_vision environment.Simulated core area of the chip is 2.0 mm

    2. Dynamic power

    consumption is 15mW at 68 MHz operating frequency and 1.8Vof power supply voltage. To sum up, our design greatlyoutperforms the original target specifications and our FFTkernels overall performance is satisfactory.

    I. INTRODUCTIONIn most of todays wireless communication standards,

    Orthogonal Frequency Division Multiplexing (OFDM) is used

    in order to cope with the multipath fading wireless channel.

    OFDM is based on the Fast Fourier Transform (FFT), which is

    computationally intensive especially with large number of

    inputs. On algorithm level, the complexity of FFT is

    represented as O(N log N). As a result, baseband processors

    are required to equip with a dedicated FFT processing unit that

    is both fast and low power consuming. Power is of primary

    importance due to mobility requirement in wireless receivers

    and many more handset real-time signal processing devices

    and imaging devices.

    In this work, we have chosen a particularly low-power FFT

    unit from the literature and implemented it in RTL. The FFT

    unit is that of [1] which only requires 23 clock cycles tocompute, and occupies only 6.8 mm2 core area. Compared to

    other hardware FFT implementations, the work of [1] offers

    the most attractive specifications for wireless communication

    applications and many other applications in signal processing

    as well.

    This paper is organized as follows. Design decisions are

    discussed in section 2, and results are presented in section 3.

    Finally the paper is summarized and concluded in section 5.

    II. DESIGN DECISIONSA. Specifications

    Figure 1. Interface Diagram

    The figure above illustrates a high level overview of the

    FFT Kernel interface, all the detailed descriptions of input/

    output ports are given below in Table 1:

    signal direction descriptions

    CLK input System Clock

    RESET inputThe FFT kernel resets itself

    when RESET goes low

    X [31:0] input

    Serial input signal at rising edgeof CLK, each 32 bits long; it

    takes 57 clock cycles to start

    core FFT computation

    MODE input

    Mode controls the FFT/ Inverse

    FFT functionality:

    Mode = 0 output FFTMode = 1 output IFFT

    Y[31:0] output

    Serial output signal at rising

    edge of CLK, each 32 bits long;

    every 64 sets of outputs

    represent a period of FFT/ IFFTcalculation

    O_STB outputA data ready signal, when

    O_STB =1 it means the output is

    ready and valid data sets;

    Table 1. Signal Specifications of FFT Core

    Since VLSI based FFT IP Cores is an existing technology,

    we have a clear set of target specifications, listed as follows is

    a 0.25 m technology implemented 64-point FFT corepublished in 2004 [1]:

  • 7/30/2019 64_FFT_final

    2/8

    2

    Target Item Reference value

    Core Area 6.8 mm2

    Clock Frequency 20MHz

    Dynamic Power 41 mW

    Data Representation Signed fixed-point

    No. Flip-Flops 7134

    Core computational clock

    cycles23

    Points of FFT 64

    Word Length 16

    Table 2. Initial Target Specifications

    B. Implementation(I) A Break-Down of 64 Point FFT Algorithm Background

    The Discrete Fourier Transform W k of a complex timeseries w n where , 0,1, 2... 1n k N can be defined asthe following:

    0

    Nnk

    N

    n

    W k w n W

    (1)While the Inverse DFT takes the form of the following:

    0

    1 N nkN

    k

    w n W k W N

    (2)

    Where2 /j N

    NW e , suppose that N M T , k s Tt ,

    n l Mm , where 0,1,2...7l ; , 0,1... 1m t T Substituting these into equation (1) we could get the following:

    1 1

    0 0

    M Tlt sl sm

    M MT T

    l m

    W s Tt W W w l Mm W

    (3)

    As can be observed in equation (3), an M T dimensionalDFT could be breakdown to a pair of 2-D DFTs with and

    T points respectively. Our proposal for the 64-point FFT

    kernel comes from a direct application of such an algorithm:

    7 7

    8 64 8

    0 0

    8 8lt sl sm

    l m

    W s t W W w l m W

    (4)

    (II) Designing Blueprint An Architectural View

    The block diagram of our 64-point FFT/ IFFT core is

    illustrated in the following figure 2; here we divide the core

    into four sub modules: Input/Output Unit, two 8-points FFT

    Units, Complex Multiplier Unit and Internal Register Bank

    Unit. Such a structure has its own unique advantages when

    compared with other competing proposals, as will be further

    discussed in the following section.

    Figure 2. Block Diagram of the proposed 64 FFT

    (III) Pipelining vs. Parallel Working

    Notice that in such architecture, we allocate pipelining and

    parallel working units in an evenly distributed manner, rather

    than just sharing one physical functional unit and leaving

    everything else to pipelining register bank.

    Actually, there is another competing proposal for FFT

    implementation in which theres only one butterfly unit

    integrated, and a super register bank takes care of the

    pipelining work load in a very delicate manner. Yet afterdiscussion, we decided that such a proposal would very likely

    be a bad idea, since (1) from thermal analysis point of view: it

    might work pretty well for 16-points FFT Unit, yet for 64-

    points FFT with such a mechanism, huge percentage of work

    load will fall on the pipelining unit alone and makes it very

    hot meanwhile the rest of the core is quite cool, we think this

    is one of the circuit design pitfalls that we should try to avoid.

    (2) It may not scale well: the work load and complexity of the

    pipelining unit accumulate dramatically when we later

    integrate implemented IP cores to form more complicated

    cores.

    Due to such concerns and time pressure, we finally chose

    the current proposal for our 64-points FFT kernelimplementation.

    (IV) Design Environment and Technology Library

    For Logic Design and Synthesis, we use design_vision

    installed on Sun stations of ECE LRC. For simulation, we use

    VCS (Verilog Compiler Simulator) Tool suite from Synopsys;

    for verifications and testing, we use VCS, VirSim and

    MATLAB environments. Since we employed singed fixed

    point representation mechanism, most of the data format

    conversions in the Verilog test bench interface with Fixed

    Point Toolbox of Matlab with a version higher than 7.0.

    The technology library linked for the compiling comes from

    Lab3 of VLSI-I, which is HT018.db

    (V) Modular Design

    Module 1: Input/Output Unit Design

    The following Table 3 lists in detail the basic input/output

    port descriptions of the implemented I/P module.

  • 7/30/2019 64_FFT_final

    3/8

    3

    Table 3. Signal description of I/P module

    Basically, I/P unit performs a serial to parallel conversion to

    the input data and interfaces with the first 8-points FFT

    module, meanwhile it receives control from the

    Control_Counter, it also contains embedded buffers for

    temporary data storage, since some of the parallel

    multiplication needs more than one clock cycle to complete.This will be further elaborated in Multiplier Unit.

    Figure 3. Overall structure of Input Module

    The above block diagram in figure 3 illustrates the basic

    working principles of the input unit, where we can see

    necessary combinational logics (C.L block), swapping block

    (SWAP) and internal counters. The swapping unit offers a

    data path for the IFFT functionality and it is controlled by

    mode, which is an input listed in Table 3.Combinational Logic block in Figure 3 is controlled by a 5

    bit counter, such a counter properly paces the I/P unit, O/P

    unit and Multiplier Unit. It is also one of the outputs of the

    FFT processor, which offers the user a good strobe port for

    better understanding of the internal working processes of the

    64-points FFT kernel.

    For a better and more thorough inspection into the Parallel

    Conversion block, we could take a look atFigure 4 as follows,

    Figure 4. Detail of Parallel Conversion Block

    There are 3257 register array in the Parallel Conversion

    block. Once every 8 target points are ready for parallel output,

    these data will then be fed into the first 8-points FFT block onthe rising edge of the system clock, as shown infigure 4.

    Module 2: 8-points FFT Units (1st and 2nd FFT units)

    signal direction descriptions

    data_in input255 bits input

    data

    data_out output255 bits output

    data

    Table 4. Port descriptions of the 8-point FFT units

    Table 4 summarizes the input/output port characterizations

    of the 8-point FFT unit. This is a pure combinational logic

    unit.

    Figure 5. Flow Chart of Decimation-In-Time FFT (N=8)

    Signal Descriptions I/O

    CLK clock input

    RST Reset signal input

    Data_start Enabling signal input

    Data_in 32-bit input data input

    mode FFT/IFFT control input

    Controlcounter

    Interface port withcontrol counter

    input

    Start_count Enabling signal output

    Data_out255-bit output data in

    paralleloutput

  • 7/30/2019 64_FFT_final

    4/8

    4

    Figure 6. Basic building block of butterfly structureAs a purely combinational logic unit, the 8-point FFT takes

    in parallel data of inputs, and assigns corresponding FFT

    results to the output wires, as shown in figure 5 andfigure 6.

    For such a design mechanism, each output is computed in a

    parallel manner and no flops are employed. A total number of

    12 butterfly structured subunits and 5 complex multiplier

    subunits are placed. This will add extra area and power

    consumptions to the butterfly units, yet both register bank and

    multiplier unit benefit greatly from such a trade-off and

    satisfactory overall performance of the kernel is guaranteed.

    Within the 8-point FFT module, there are 5 complex

    multiplications; here in our design, techniques are employed

    to make sure that least possible number of actual complexmultiplications are carried out. For the 8-points FFT unit

    specifically, theres only one complex multiplication used,

    while all other multiplications are achieved by proper

    swapping and assigning.

    Figure 7. Double precision multiplication mechanism

    Since signed fixed point representation mechanism is

    employed throughout the design, complex multiplications are

    calculated based on two categories: for positive numbers,

    shifting and addition are carried out and 0s will be shifted

    into the word; for negative numbers, shifting and addition are

    carried out and 1s will be shifted into the word. As is shown

    infigure7, notice that a double precision calculation is carried

    out, which means we double the input length before the

    shifting and addition processes, then truncated the result backinto 16 bit word length. By doing so, we actually found out

    that accuracy is apparently enhanced in simulation results.

    Module 3: Multiplier Unit

    Figure 8. Block Diagram of the Multiplier Unit

    For the 2-D break-down of the 64-points FFT algorithm,

    complex coefficients have to be multiplied with the output of

    the first 8-points FFT unit before feeding data into the second

    FFT unit. Here 8 complex numbers have to be dealt with.

    Techniques are employed so that minimal numbers ofoperations are carried out. Similar to the mechanism in 8-point

    FFT unit, double precision method is used and also reusable

    results are recycled and swapped. For a modular design

    perspective, the eight constants are kept in sub-modules

    respectively. All operations are monitored by controlling

    signals. The following Table 5 gives a input/output port

    summary of the multiplier unit:

    signal direction descriptions

    COUNT inputcontrolling 5bit

    counter

    Input_data input 255 bits dataOutput_data output 255 bits data

    Table 5. Port description of the Multiplier Unit

    Module 4: Internal Register Bank

    The internal register bank (CB) is integrated into the system

    for temporary storage of the 64 complex data coming from the

    multiplier unit. CB has 8 wired 255-bits inputs in parallel and

    8 wired 255-bits outputs in parallel, which are directly fed into

    the second 8-points FFT unit. At every clock cycle, the

    appropriate data at the output of the CB gets aligned with the

    target input of the second 8-point FFT unit. Since the second8-point FFT unit is pure combinational, 255 bits of input data

    will be processed before the next cycle arrives. Therefore the

    downward shifting in CB can be carried out each cycle

    without being interrupted.

    Essentially, the CB unit is the same to the input unit except

    that there are no swapping blocks and buffering registers.

    The following Table 6summarizes the input/output ports of

    the CB unit:

  • 7/30/2019 64_FFT_final

    5/8

    5

    signal direction descriptions

    CLK input System clock

    RST input Reset signal

    Data_in input 255 bits input

    COUNT input Control counter

    Data_out output 255 bits output

    Table 6. Port summary of CB unit

    Module 5: Output Unit

    signal direction descriptions

    CLK input system clock

    RST input reset signal

    Data_in input 255 bits input

    mode input FFT/IFFT control

    Data_out output 32 bits output

    O_STB output data ready signal

    Table 7. Port summary of O/P unit

    Shown in Table 7 above is the basic input/output port

    description of the O/P Unit.

    Similar to I/P unit, the O/P unit converts the parallel signals

    back to serial signals and interfaces with the user and/or LCD

    display. There are no buffering registers in this module and

    swapping function is selected by input port mode to send

    out FFT/IFFT throughput.

    C. OptimizationAs discussed in previous chapters, several optimization

    techniques are employed during the process of module

    designing. Actually, many many versions of structures,

    modules and codes were modified and tested before we

    finalized the design. Here we list the most important twotechniques among the many:

    (I) Computing Accuracy

    Previously, our multiplication functionalities are carried out

    with 16 bits of accuracy, which is the same word length of

    actual data passing through the FFT kernel. Yet simulations

    show unsatisfactory errors of the core when compared with

    expected outcomes from MATLAB7.0 simulator.

    For such a problem, we doubled the bit length of each word

    after it enters a complex multiplication block, and then

    truncated the 32 bits of word back to 16 bits before output port.

    With such mechanism, enhanced accuracy turns out to be

    quite satisfactory. Further demonstration will be given in

    Testing and Verification chapter that follows.

    (II) Intermediate Results Recycling

    Although a break down of 64-points of FFT greatly reduces

    the computational complexity, there is still quite some amount

    of complex operations going on. To further reduce the

    calculation complexity for the 8-point FFT Units and

    particularly the Multiplier Unit, necessary reuse of

    intermediate results turns out to be a good idea for further

    power reduction and speed accelerations. That explains

    exactly why we have much less complex multiplications in

    our design than it actually takes to build a 64-points FFT

    kernel.

    Further results and figures will be provided in the following

    section of Testing and Verifications.

    D. Testing and VerificationsOur testing bench of the 64-points FFT kernel interfaces

    with both Synopsys and Matlab environments through file

    operations. Our testing target is to show the calculation error,

    therefore the actual spectrum of the testing cases are given

    later in Section III. Our testing cases consist of six well known

    time series signals in digital signal processing area, plus one

    supper test case which involves 1000 randomized input data

    sets. Using such a methodology, we covered more than 64000

    input data testing and square error analysis, meanwhile

    offering quite some direct applications of our implemented 64-

    points FFT kernel in the field of static spectrum analysis.

    Test case 1:

    Input time series is a rectangle shaped pulse, detailed testing

    case in 16-bit fixed point representation is listed inAppendix 2.

    We analyzed and plotted the square error of the physical

    unit output with respect to the standard output of our Matlab

    simulator, as follows,

    0 10 20 30 40 50 600

    2

    4

    6

    8x 10

    -9 Testbench1 (MSE=-2.3283e-010+1.397e-009i)

    points in frequency

    errorvalue

    Figure 9. MSE of the 64-FFT kernel for test case 1

    As illustrated in Figure 9, max Mean Square Error is

    suppressed below 810-9 and 82% of the physical output is

    100% accurate.

    Test case 2:

    Input time series is a cosine shaped wave, detailed testing case

    in 16-bit fixed point representation is listed inAppendix 2.

    We analyzed and plotted the mean square error of the

    physical unit output with respect to the standard output of our

    Matlab simulator, as follows,

  • 7/30/2019 64_FFT_final

    6/8

    6

    0 10 20 30 40 50 60-1

    -0.5

    0

    0.5

    1x 10

    -5 Testbench2 (MSE= -7.5437e-008-3.574e-008i)

    points in frequency

    errorvalue

    Figure 10. MSE of the 64-FFT kernel for test case 2

    As illustrated in Figure 10, max Mean Square Error is

    suppressed below 710-6.

    Test case 3:

    Input time series is a cos sinj shaped wave, detailedtesting case in 16-bit fixed point representation is listed in

    Appendix 2.We analyzed and plotted the mean square error of

    the physical unit output with respect to the standard output of

    our Matlab simulator, as follows,

    0 10 20 30 40 50 60-1

    -0.5

    0

    0.5

    1x 10

    -4 Testbench3 (MSE=1.2444e-006-5.8627e-007i)

    points in frequency

    errorvalue

    Figure 11. MSE of the 64-FFT kernel for test case 3

    Test case 4:

    Input time series is a real constant value function, detailed

    testing case in 16-bit fixed point representation is listed in

    Appendix 2. The mean square error of the physical unit output

    with respect to the standard output of our Matlab simulator, as

    follows, accuracy is 100%.

    0 10 20 30 40 50 60-1

    -0.5

    0

    0.5

    1Testbench4 (MSE=0)

    points in frequency

    errorvalue

    Figure 12. MSE of the 64-FFT kernel for test case 4

    Test case 5:

    Input time series is a truncated pulse series, detailed testing

    case in 16-bit fixed point representation is listed inAppendix 2.

    We analyzed and plotted the mean square error of the physical

    unit output with respect to the standard output of our Matlab

    simulator, as follows,

    0 10 20 30 40 50 60-2

    -1

    0

    1

    2

    3

    4

    5

    x 10-7 Testbench5 (MSE=-4.773e-009+6.6357e-009i)

    points in frequency

    errorvalue

    Figure 13. MSE of the 64-FFT kernel for test case 5

    Test case 6:

    Input time series is a triangle shaped time series, detailed

    testing case in 16-bit fixed point representation is listed in

    Appendix 2. We analyzed and plotted the mean square error of

    the physical unit output with respect to the standard output of

    our Matlab simulator, as follows,

    0 10 20 30 40 50 60-4

    -3

    -2

    -1

    0

    1x 10

    -7 Testbench6 (MSE=-3.0268e-009-5.1223e-009i)

    points in frequency

    errorvalue

    Figure 14. MSE of the 64-FFT kernel for test case 6

    As illustrated in figures above, the precision of the proposed

    64-points FFT kernel is satisfactory with respect to the six

    testing cases in signal processing application.

    Massive input data test case

    Corner cases are not enough to cover the range of a thorough

    testing, so we construct 1000 data sets of inputs and

    concatenate them into one massive-input-data-file then feed it

    into the testing bench, where 1000 cycles of 64-points FFT

    will be carried out continuously within the implemented FFT

    kernel. The following figure shows the absolute error value of

    the 64000 input cases, as demonstrated in the following

  • 7/30/2019 64_FFT_final

    7/8

    7

    Figure 15, the calculation precision is quite satisfactory for

    fixed point representation mechanism:

    0 1 2 3 4 5 6

    x 104

    0

    1

    2

    3

    4

    5x 10

    -3

    number of points

    absolute

    errorvalue

    Error between MATLAB & Module (random signal)

    Figure 15. Absolute Error of the 64-FFT kernel for supper test case

    III. RESULTSA. Functionality

    For all the test cases we used in Testing and Verificationsector, here we give the spectrum plots, which are the actual

    outputs of our implemented 64-points FFT kernel. Verilog

    HDL codes for the FFT kernel is listed in Appendix 1.

    Test case 1:

    Input time series is a rectangle shaped pulse, therefore the

    Amplitude Spectrum is a Sinc function shaped spectrum, as

    shown inFigure 16below,

    0 10 20 30 40 50 600

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    points in frequency

    amplitudespectrum

    Testbench1 (MSE=-2.3283e-010+1.397e-009i)

    actual

    expected

    Figure 16. Amplitude spectrum of test case 1 by our FFT core

    Test case 2:

    Input time series is a Sinusoidal shaped waveform; therefore

    the Amplitude Spectrum is a double-pulse function shaped

    spectrum, as shown inFigure 17

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    points in frequency

    amplitudespectrum

    Testbench2 (MSE=-7.5437e-008-3.574e-008i)

    actual

    expected

    Figure 17. Amplitude spectrum of test case 2 by our FFT core

    Test case 3:

    Input time series is a complex exponential shaped waveform,

    therefore the Amplitude Spectrum is a single pulse shaped

    spectrum, as shown inFigure 18 below,

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    2

    points in frequency

    amplitudespectrum

    Testbench3 (MSE=1.2444e-006-5.8627e-007i)

    actual

    expected

    Figure 18. Amplitude spectrum of test case 3 by our FFT core

    Test case 4:

    Input time series is a constant function, therefore the

    Amplitude Spectrum is a pulse function shaped spectrum with

    the pulse located at the zero frequency, as shown inFigure 19

    below,

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    points in frequency

    amplitudesp

    ectrum

    Testbench4 (MSE=0)

    actual

    expected

    Figure 19. Amplitude spectrum of test case 4 by our FFT core

  • 7/30/2019 64_FFT_final

    8/8

    8

    Test case 5:

    Input time series is a truncated pulse series, therefore the

    Amplitude Spectrum is an amplitude modulated series of

    pulses in frequency domain, as shown in Figure 20 below.

    Such a signal is often used for Pulse-Doppler Radar Imaging

    systems.

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    2

    points in frequency

    amplitudespectrum

    Testbench5 (MSE=-4.773e-009+6.6357e-009i)

    actual

    expected

    Figure 20. Amplitude spectrum of test case 5 by our FFT core

    Test case 6:

    Input time series is a triangular shaped waveform, therefore

    the Amplitude Spectrum is a product of two Sinc functions , as

    shown inFigure 21 below,

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    2

    points in frequency

    amplitudespectrum

    Testbench6 (MSE=-3.0268e-009-5.1223e-009i)

    actual

    expected

    Figure 21. Amplitude spectrum of test case 6 by our FFT core

    From the figures above, we could observe that the static

    spectrum is accurate to the best of our knowledge.

    B. Timing / Area SynthesisFor an initial clock period of 20 ns, the synthesized area ofour 64-points FFT kernel is 2.0 mm2, timing is 14.8 ns, and

    dynamic power consumption reaches 15 mW. A total of three

    times of optimizations are tried out and the following Table 8

    lists the optimized results. For detailed reports, please refer to

    Appendix 3. More optimizations could be carried out if time

    allows.

    target item reference our design

    core area 6.8mm2(.25m) 2.0mm2(.18m)

    clock freq 20MHz 68MHz

    No. registers 7134 5713

    power 41mW 15mWTable 8. A comparison between our design and reference design

    C. APR and Physical LayoutThe chip is placed and routed in Cadence/2006 Encounter

    environment, connectivity is verified and theres no violation

    or warning detected. A snapshot of the physical layout of the

    64-points FFT kernel is displayed as follows inFigure 22:

    Figure 22. APR physical layout of the 64-point FFT core

    IV. SUMMARY AND CONCLUSIONSIn this paper, we have described the design and

    implementation of a serial 64 point FFT suitable for wireless

    and modern signal/image processing applications. We

    described the modular design in register-transfer level (RTL),

    and synthesized and optimized our modules using Design

    Vision. We verified our design at various stages, namely at the

    RTL level and post-synthesis. We used golden model testbenches where MATLAB was used to generate valid

    input/output vectors. Then the input vectors were applied to

    the FFT and the output was compared. We demonstrated that

    our processor passed the functionality tests with more than

    64,000 data points.

    Our FFT chip operates well beyond the target frequency of

    20 MHz and occupies only 2.0 mm2 in a 0.18m process.

    Once the serial data is in the FFT unit, only 23 clock cycles

    are required to produce the output. Therefore, the FFT can be

    computed in less than a microsecond.

    Overall, the current work allowed us to go through the

    design cycles and learn to make important design decisions for

    achieving the goal of small area, and minimal delay. It was

    also a great exercise on teamwork and collaboration with

    colleagues. As a result of this project, we feel were able to

    apply many of the concepts learned in the course, hence, the

    objectives of the course were fully fulfilled.

    REFERENCES

    [1] K. Maharatna, E. Grass, and U. Jagdhold A 64-Point Fourier TransformChip for High-Speed Wireless LAN Application Using OFDM,IEEE

    JSSC. Vol. 39, No. 3, March 2004