Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP...

download Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

of 16

Transcript of Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP...

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    1/16

    May 2011 Altera Corporation

    WP-01159-1.0 White Paper

    Subscribe

    2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS,QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off. and/or trademarks of Altera Corporation in the U.S. and o ther countries.All other trademarks and service marks are the property of their respective holders as described atwww.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications inaccordance with Alteras standard warranty, but reserves the right to make changes to any products and services at any timewithout notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, orservice described herein except as expressly a greed to in writing by Altera. Altera customers are advised to o btain the latestversion of device specifications before relying on any published information and before placing orders for products or services.

    101 Innovation Drive

    San Jose, CA 95134

    www.altera.com

    Feedback

    Enabling High-Performance DSPApplications with Arria V or Cyclone V

    Variable-Precision DSP Blocks

    This document highlights the benefits of variable-precision digital signal processing

    (DSP) architecture in Alteras new Arria

    V and Cyclone

    V FPGAs. Altera's variable-precision DSP block allows designers to tailor the precision on a block-by-block basis,thereby saving resources and power while increasing performance.

    IntroductionDSP designs use hundreds or thousands of multipliers as basic building blocks toimplement filters, fast Fourier transforms (FFTs), and encoders that digitally processsignals. Depending on the specific type of filter required, varying precision levels may

    be required within a design at each stage of FIR filters, FFTs, detection processing,adaptive algorithms, or other functions. In addition, DSP algorithms with varyingprecision levels often require precision higher than 18 bits. The following sections

    discuss the benefits of Alteras variable-precision DSP architecture available inArria V and Cyclone V devices.

    Key DSP Design TrendsThe range of DSP precision requirements varies by application, as shown in Figure 1.Video applications use multipliers ranging from 9x9 to 18x18. Wireless and medicalapplications push precision requirements even further when implementing complex,multi-channel filters that must maintain data precision after each filter stage. Military,test, and high-performance computing also push the performance and precisionrequirements, sometimes requiring single- and double-precision floating-pointcalculations for implementing complex matrix operations and signal transforms.

    Figure 1. Applications and Precision Range

    Video

    Surveillance

    Broadcast

    Systems

    Wireless

    Basestations

    Medical

    Imaging

    Military

    Radar

    High-Performance

    Computing

    9-Bit Precision

    100 GMACS

    Floating-Point Precision

    TeraFLOPS

    Applications Moving to Variable and Higher Precisions

    https://www.altera.com/servlets/subscriptions/alert?id=WP-01159http://www.altera.com/common/legal.htmlhttp://www.altera.com/common/legal.htmlhttp://www.altera.com/mailto:[email protected]?subject=Feedback%20on%20WP-01159mailto:[email protected]?subject=Feedback%20on%20WP-01159https://www.altera.com/servlets/subscriptions/alert?id=WP-01159https://www.altera.com/servlets/subscriptions/alert?id=WP-01159mailto:[email protected]?subject=Feedback%20on%20WP-01159mailto:[email protected]?subject=Feedback%20on%20WP-01159mailto:[email protected]?subject=Feedback%20on%20WP-01158https://www.altera.com/servlets/subscriptions/alert?id=WP-01158mailto:[email protected]?subject=Feedback%20on%20WP-01155https://www.altera.com/servlets/subscriptions/alert?id=WP-01155http://www.altera.com/http://www.altera.com/common/legal.html
  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    2/16

    Variable-Precision DSP at 28nm Page 2

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    The DSP architecture of the 28-nm Arria V and Cyclone V FPGAs is optimized tosupport both high-performance and variable data precision that enables area andpower efficient implementation of both fixed and floating-point operations.

    High-Precision DSP Applications

    Many cutting-edge applications require high-performance DSP designs that supporthigher than 18-bit precision, as shown in Figure 2. Precision in this context meansthe size of a multiplier, for example 9x9, 12x12, 18x18, 27x27, and other sizes. Morespecifically, precision refers to the width of each operand applied to a multiplier.

    Many traditional DSP functions such as FIR filters, FFTs, and custom signalprocessing datapaths have high-precision requirements. These functions arecommonly implemented in military, medical, and wireless systems. When designsrequire precision higher than 18-bit, designers may implement floating-point signalprocessing to reach this precision level in high-end designs, such as military space-time adaptive radars and MIMO processing on LTE channel cards. Alteras 28-nmsilicon architecture introduces the industry's first variable-precision DSP architecturethat allows designers to tailor the precision of each DSP block to perfectly suit theapplication.

    Variable-Precision DSP at 28nmThe variable-precision DSP block in Arria V and Cyclone V FPGAs allow designers toselect from 9x9 precision to implement a video processing design, all the way up tofloating-point precision required for advanced radar designs. Designers canindividually set each DSP block precision to efficiently accommodate bit growth andrequired precision increases within the DSP datapath. In addition, the Arria V andCyclone V DSP block is backward-compatible with all modes supported by Alterasprevious generation 65-nm and 40-nm device families. Figure 3 illustrates theprecision ranges supported by a single Arria V or Cyclone V DSP block.

    Figure 2. High-Performance Applications

    MILITARY

    MEDICAL

    WIRELESS

    HIGH-PERFORMANCE COMPUTING

    TEST AND MEASUREMENT

    High-Precision Multiply

    Accumulate

    High-Precision Finite ImpulseResponse (FIR) Filters

    High-Precision Fast Fourier

    Transforms (FFTs)

    Floating-Point FFTs

    Floating-Point Matrix

    Operations

    http://-/?-http://-/?-
  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    3/16

    Variable-Precision DSP at 28nm Page 3

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    Variable-Precision DSP BlocksFigure 4 maps the multiplier precision required by various FPGA markets to thesupported multiplier precisions in Arria V DSP blocks. The Arria V DSP blocknatively supports nearly all of precision levels required by these applications. Thefollowing sections describe the full-precision, 18x18 with pre-adder mode that iseffective in the wireless market.

    Figure 3. Architecture with Selectable Precision

    Vid

    eo

    Wireless

    Milita

    ry

    9x

    918x18

    18x

    2527x27

    36x

    36

    54x54

    Set the Precision Dial to Match Your Application

    Figure 4. Precision Requirements and Arria V Precisions

    IndustrialVideo

    BroadcastSystems

    WirelessSystems

    MedicalImaging

    MilitaryRadar

    High-

    PerformanceComputing

    Precision

    Requirements9x9

    12x12

    16x16

    18x18

    18x18

    18x25

    18x36

    27x27

    18x18

    18x25

    18x36

    27x27

    27x27

    54x54

    Supported Precisions9x9

    12x12

    16x16

    18x18

    18x18

    18x25

    18x36*

    27x27

    18x18

    18x25

    18x36*

    27x27

    27x27

    36x36*

    54x54*

    Supported Precisions

    (Competitive FPGAs) 18x25 18x25 18x25 18x25 18x25 18x25

    * Requires additional logic outside of the DSP block to implement

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    4/16

    Variable-Precision DSP at 28nm Page 4

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    Variable-Precision Modes

    The Arria V, Cyclone V, and Stratix V DSP block are the first to offer two nativeprecision modes, as shown in Figure 5.

    The available modes are 18-bit mode, and high-precision mode for 27x27multiplications. Figure 6 shows the various multiplier precision modes available inthe Arria V (and Cyclone V) DSP block. Designers can implement an 18x36 multiplier

    by using one DSP block plus additional logic outside the DSP block. Similarly,designers can implement a 36x36 multiplier by using two DSP blocks and additionallogic outside the DSP block, or a 54x54 multiplier by using 4 DSP blocks andadditional logic outside the DSP block.

    Variable-Precision Efficiency

    While the key advantage of variable precision is the ability to take advantage of block-

    by-block implementation efficiencies, the Arria V variable-precision DSP block alsoprovides the highest number of multipliers of different precisions compared tocompeting architectures, as shown in Figure 7.

    Figure 5. Arria V and Cyclone V DSP Modes

    108 Bits

    InputRegister

    +/- X

    X

    CoefficientBank

    Coefficient

    Bank

    +

    -+-

    +-

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    +-

    +-

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    CoefficientBank

    +/-

    18-Bit Precision Mode High-Precision Mode

    Figure 6. Precisions Available in Arria V and Cyclone V FPGAs

    Within 1 DSP Block Within 2 DSP Blocks

    Within 3 DSP Blocks

    Quantity QuantityMultiplier Mode Multiplier Mode

    3

    2

    2

    2

    1

    1

    1

    1

    1

    1

    1

    1

    9x9

    12x12

    16x16

    18x19

    18x25

    27x27

    18x36*

    18x18 complex multiply

    36x36* complex multiply

    18x25* complex multiply

    27x27 complex multiply

    54x54* complex multiply

    Within 4 DSP Blocks

    1 18x36* complex multiply

    * Requires additional logic outside of the DSP block to implement

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    5/16

    Variable-Precision DSP at 28nm Page 5

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    DSP Multiplier Comparison

    Variable-precision DSP blocks provide significant advantages when implementingmultipliers of varying precision. Figure 8 compares an Arria V device of 363 KLEs and1045 variable-precision DSP blocks, against a Kintex-7 device of 356 KLCs and 1440DSP blocks. When compared with the Kintex-7 XC7K355T device, the Arria V5AGXB3 device variable-precision DSP blocks provide a clear advantage whenimplementing multipliers of different precisions. Nearly across the board, variable-precision DSP blocks provide more multipliers per device.

    Although competing solutions may offer a few more multipliers in the 18x25 mode,this mode accounts for only a small portion of actual user configurations. Figure 9provides a comparison of Cyclone V FPGA multipliers against competitive solutions.In general, the Cyclone V device offers more multipliers of different precisions thanthe Artix-7. The only exception is in the case of 18x25 precision.

    Figure 7. Multiplier Precision Comparison

    Figure 8. Arria V FPGA Multiplier Count Comparison

    2X to 3X Number of Multipliers per Variable-Precision

    DSP Block Means Power Reduction

    Multiplier Precision

    Arria V & Cyclone V

    Variable-PrecisionDSP Block

    18x25

    DSP48 Block

    9x9 (industrial video) 3 per block 1 per block

    12x12 (broadcast) 2 per block 1 per block

    16x16 (broadcast, digital cinema) 2 per block 1 per block

    18x18 (wireless, medical, military) 2 per block 1 per block

    18x25 (wireless, medical, military) 1 per block 1 per block

    18x36 (medical, military) 1 per block* 0.5 per block

    27x27 (military, high-performance

    computing)1 per block 0.5 per block

    * Requires additional logic outside of the DSP block to implement

    More DSP Resources vs. the Competition

    Multiplier PrecisionArria V FPGA

    5AGXB3

    Kintex-7

    XC7K355T

    9x9 (industrial video) 3,135 1,440

    12x12 (broadcast) 2,090 1,440

    16x16 (broadcast, digital cinema) 2,090 1,440

    18x18 (wireless, medical, military) 2,090 1,440

    18x25 (wireless, medical, military) 1,045 1,440

    18x36 (medical, military) 1,045 720

    27x27 (military, high-performance

    computing)1,045 720

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    6/16

    DSP Block Evolution Page 6

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    DSP Block EvolutionAlteras DSP block architecture has evolved at each process node over time, asillustrated in Figure 10. The fundamental theme of this evolution is backwards-compatibility and new features that support the next generation of DSP systemdesigns.

    Historically, the Arria device DSP block implemented four independent 18x18multipliers. The Arria II device DSP block continues to support this mode and adds

    more efficient implementation of eight 18x18 multipliers in sum mode via a 44-bitcascade bus. Designers can effectively use this mode to implement common FIR filterstructures.

    The latest 28-nm, variable-precision DSP blocks in Arria V and Cyclone V devicesmaintain compatibility with previous generation devices, while increasing capabilityfor higher precision signal processing. The Arria V and Cyclone V DSP blockarchitecture fabric is enhanced to implement the highest performance and highestprecision DSP application data paths.

    Figure 9. Cyclone V FPGA Multiplier Count Comparison

    More DSP Resources vs. the Competition

    Multiplier Precision

    Cyclone V FPGA

    5CGXC4

    Artix-7

    XC7A50T

    9x9 (industrial video) 210 120

    12x12 (broadcast) 140 120

    16x16 (broadcast, digital cinema) 140 120

    18x18 (wireless, medical, military) 140 120

    18x25 (wireless, medical, military) 70 120

    18x36 (medical, military) 70 60

    27x27 (military, high-performance

    computing)70 60

    Figure 10. Evolution of DSP Blocks in Arria V FPGA

    36

    36

    36

    36

    36

    36

    36

    36

    72

    72

    72

    72

    36

    36

    36

    36

    108

    108

    108

    108

    74

    74

    74

    74

    DSP Half Block

    DSP Half Block

    Variable-Precision

    DSP Block

    Variable-Precision

    DSP Block

    Variable-Precision

    DSP Block

    Variable-Precision

    DSP Block

    Eight 18x18 Multipliers

    (Sum)

    Four 18x18 Multipliers

    (Independent)

    Arria II FPGA Arria V FPGAArria GX FPGA

    DSP Block

    Eight 18x19 Multipliers

    (Sum)

    Eight 18x19 Multipliers

    (Independent)

    Four High-Precision

    Mode Blocks

    Four 18x18

    Multipliers

    (Independent)

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    7/16

    DSP Block Evolution Page 7

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    Key DSP Enhancements

    Arria V and Cyclone V DSP blocks include the following enhancements:

    Pre-adders

    18x19 Multipliers

    Coefficient Banks

    Feedback Registers

    Independent Multipliers

    The following sections discuss these enhancements in greater detail.

    Pre-adders

    The Arria V and Cyclone V DSP block is enhanced to include pre-adders to reducemultiplier count in symmetric FIR filters, as shown in Figure 11. These pre-addersaccept full 18-bit operands, including sign bits. These pre-adders are referred to ashard pre-adders because they are implemented in dedicated hardware resources,

    rather than as FPGA logic gates.

    Figure 12 provides a more detailed view of the hard pre-adders. The next sectionprovides an example application that uses pre-adders in a FIR filter design.

    Figure 11. Pre-Adders High Level View

    108 Bits

    InputRegister

    +/- X

    X

    Coefficient

    Bank

    Coefficient

    Bank

    +

    -+

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedback

    Regis

    ter

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    Pre-Adders

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    8/16

    Page 8 DSP Block Evolution

    Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks May 2011 Altera Corporation

    Figure 13 illustrates the use of pre-adders in a FIR filter. Typically designers use pre-adders for building symmetric FIR filters. As the filter data is shifted across the

    coefficient set, two data samples can be multiplied by a common coefficient due to thesymmetry. The pre-adder adds two samples prior to multiplication, which allows theuse of one, rather than two, multipliers for every two data samples. Pre-adders reduce

    by half the number of required multipliers for symmetric FIR filters, and eliminate theneed to implement such adders using the logic gates in the FPGA. This techniqueincreases logic efficiency and performance. Designers can use this hard pre-adder aseither a dual 18-bit pre-adder, or as a single 27-bit pre-adder, depending on therequired precision.

    Figure 12. Hard Pre-Adder Detail

    Figure 13. Usage of Pre-adders in Symmetric FIR Filter

    +

    _

    X

    19

    18

    38

    C0

    C1

    18

    18

    18

    Enhanced Pre-Adders

    +/- X

    19

    18

    18

    +/-

    +

    _

    X

    19

    18

    38

    18

    18

    18

    Enhanced Pre-Adders

    +/- X

    19

    18

    18

    +/-

    +

    D3 D3D2 D2D1

    D1

    D0

    D0

    C0 C0 C0C1 C1 C1

    X

    XXXXX XX

    +

    +

    +

    ++

    +/-

    +/-

    +

    -

    Benefit: reduce from

    4 to 2 multipliers

    Pre-Adders

    18x19

    18x19

    18-Bit

    Coeffic

    ient

    18-Bit

    Coeffic

    ient

    http://-/?-http://-/?-
  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    9/16

    DSP Block Evolution Page 9

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    18x19 Multipliers

    The Arria V and Cyclone V DSP block is enhanced to include an 18x19 multiplier, asshown in Figure 14.

    Previous generation devices included only an 18x18 multiplier. The 18x19 multiplieraccepts 19-bit results from the output of the 18+18 pre-adder. Designers can use theextra bit in each pre-adder operand to represent the + or - sign of each operand.Figure 15 shows a close-up view of the 18x19 multiplier.

    Figure 14. 18x19 Multipliers

    Figure 15. Close-up View of 18x19 Multiplier

    108 Bits

    InputRegister

    +/- X

    X

    Coefficient

    Bank

    Coefficient

    Bank

    +

    -+

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    18x19

    Multipliers

    +

    _

    X

    19

    18

    38

    C0

    C1

    18

    18

    18

    18x19 Multipliers

    +/- X

    19

    18

    18

    +/-

    http://-/?-http://-/?-
  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    10/16

    Page 10 DSP Block Evolution

    Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks May 2011 Altera Corporation

    Figure 16 illustrates an example application of the 18x19 multiplier.

    Coefficient Banks

    Arria V and Cyclone V DSP blocks include a coefficient storage bank that isdynamically selectable on each clock cycle, as illustrated by Figure 17.

    This feature is especially helpful in DSP designs that include FIR filters implementedin hardware using a parallel or partially parallel structure, which often require only asmall number of coefficients per multiplier. Alteras variable-precision DSParchitecture provides an internal coefficient bank that designers can set to support 18-

    bit and higher precision signal processing. In 18-bit mode, the coefficient bank is

    Figure 16. Usage of 18x19 Multiplier

    Figure 17. Coefficient Blocks within the DSP Block

    MLAB MLAB

    MLAB

    MLAB

    MLAB

    MLAB

    MLAB

    MLAB

    +

    +

    +

    CoefficientBank*

    +Coefficient

    Bank*

    18

    18

    18

    18

    18

    18

    19

    19

    19

    18x19

    18x19

    18x19

    DSP BlockMLAB: N accumulator

    registers for N channels

    18x19 Multipliers

    Benefit: 18x19 multiplier accepts

    18+18 data => full 18-bit add

    with 19-bit result

    * Use memory logic array block (MLAB) for cases where

    number of coefficients per multiplier > 8

    +

    +Coefficient

    Bank* 18

    +

    108 Bits

    InputRegister

    +/- X

    X

    Coefficient

    Bank

    Coefficient

    Bank

    +

    -+

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    CoefficientBanks

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    11/16

    DSP Block Evolution Page 11

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    configured as two, 18-bit wide register banks, each capable of storing eightcoefficients per multiplier. In the high-precision mode, the coefficient bank isconfigured as a single, 27-bit wide register bank capable of storing eight coefficientsper multiplier. The coefficient banks allow designers to select which of the eightregisters should be used as a coefficient source for the multiplier for every clock cycle.

    Use of the internal coefficient bank eases timing closure complexity and reduces on-

    chip memory and register resource usage, both of which are critical in DSP designs.Figure 18 shows the coefficient bank in the 18-bit mode and in the 27-bit mode.

    Figure 19 shows how a serial filter is implemented, making use of the two 18-bitcoefficient banks. The DSP architecture of the Arria V and Cyclone V FPGA effectivelysupports this type of filter because the coefficient banks, the 18x19 multipliers, and theoutput register are all contained in one DSP block. In addition, the output can becascaded to the next block in a sequential chain. Having the coefficient bank inside theDSP block reduces logic and routing utilization, thus improving filter performance.

    Figure 18. Structure of Coefficient Bank

    Figure 19. Usage of Coefficient Bank in Filter

    0

    1

    2

    3

    4

    5

    6

    7

    18 Bits

    0

    1

    2

    3

    4

    5

    6

    7

    27 Bits

    OR

    18 Bits

    InputRegister

    64 Bits

    18 bits

    Coefficient Banks

    Benefit: reduce logic and

    routing utilization =>

    improve filter performance

    18-Bit

    Coeffici

    ent

    +/-

    X27x27

    18-Bit

    Coeffic

    ient

    +/-

    18 bits

    18 bits

    18 bits

    18 bits

    18 bits

    X

    18x19

    18-bit

    18-bit

    X

    18x19

    +-

    OutputRegister

    +-

    Bias

    Register

    Fe

    edback

    R

    egister

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    12/16

    Page 12 DSP Block Evolution

    Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks May 2011 Altera Corporation

    Feedback Registers

    Arria V and Cyclone V DSP blocks include feedback registers that can serve as thesecond stage in a two-stage accumulator comprised of the output register andfeedback registers. The relative position of the feedback register in the DSP block isillustrated in Figure 20.

    Figure 21 shows how a polyphase serial filter is implemented, with the feedbackregister enabled to provide a feedback path. This structure enables two independentserial-filter channels in one single DSP block. Each channel has its own set of input.The feedback path is time multiplexed, allowing processing of the real part and the

    imaginary part of a complex signal in alternating clock cycles. Only N/2 adders areneeded because the Arria V and Cyclone V DSP block in 18-bit mode has two 18x19multipliers per DSP block. This implementation is efficient and saves resources.

    Figure 20. Feedback Register

    Figure 21. Feedback Register Usage

    108 Bits

    InputRegister

    +/- X

    X

    Coefficient

    Bank

    Coefficient

    Bank

    +

    -+

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedb

    ack

    Regis

    ter

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    Feedback

    Register

    +

    +

    +

    ++

    M10K

    M10K

    M10K

    M10K

    Feedback Register

    18x19

    18x19

    18x19

    DSP Block

    N:1

    Complex

    Input

    DataDemultiplexer

    N/2:1

    Multistage

    Adder,

    UsingComplex

    Inputs and

    Outputs

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    13/16

    DSP Block Evolution Page 13

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    Independent Multipliers

    Arria V and Cyclone V DSP blocks include independent multipliers. This means thatthe output(s) of the multiplier(s) can be routed to the output port of the DSP blockdirectly, without going through any adder. Figure 22 shows two 18x19 multiplierswhich can be configured to work in the sum mode or independent mode

    Each DSP Block contains two 18x19 multipliers. These blocks can be used as twocompletely independent multipliers with inputs fed from outside the DSP block, asshown on the left-hand side of Figure 23, or each multiplier having one operand fedfrom a coefficient bank, and the outputs of the multipliers delivered independently, as

    shown on the right hand side of Figure 23.

    The output port of the DSP block in Arria V and Cyclone V is 74-bits wide andtherefore can accommodate the output of 37 bits of the two independent 18x19multipliers. This means that all 37 bits from each multiplier are directly accessible onthe output port.

    Figure 22. Input/Output Ports

    Figure 23. Application Example

    108 Bits

    InputRegister

    +/- X

    X

    Coefficient

    Bank

    Coefficient

    Bank

    +

    -+

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    18 Bits

    18 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits

    +/-

    18

    18

    18

    18 18x19

    18x19

    Independent

    Multipliers

    X

    19

    18

    37

    37

    C0

    C1

    18

    18

    18

    18x19 Multipliers

    +/-

    37

    18

    19

    X

    X

    37

    18

    19

    X

    19

    18

    18

    +/-

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    14/16

    Page 14 Altera Floating-Point Precision

    Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks May 2011 Altera Corporation

    Altera Floating-Point PrecisionDepending on the application, the precision requirement may require thatmultiplications are performed with single-precision, floating-point multiplications, ordouble-precision, floating-point multiplications. The Arria V and Cyclone V DSP

    block is capable of both levels of precision, as described in the following sections.

    IEEE Standard 754 floating point is the most common representation of floating-pointnumbers. In this format, single-precision floating point is 32-bits wide with a 24-bitmantissa, while double-precision floating point is 64-bits wide and has a 53-bitmantissa.

    Floating-point computations involve mantissa multiplication and exponent addition.The Altera variable-precision DSP architecture can implement mantissamultiplication for a single-precision, floating-point number using one block ORmantissa multiplication for a double-precision, floating-point number.

    Single-Precision Floating-Point Multiplication

    Using the high-precision mode, the variable-precision block is uniquely suited forimplementing single-precision, floating-point operations. Mantissa multiplication can

    be implemented using only one variable-precision block configured in the high-precision mode. This resource efficiency is an FPGA industry first. Traditionallydesigners had to cascade multiple blocks to implement this operation. The coefficientsmay be applied externally as shown on the left-hand side or internally as shown onthe right-hand side in Figure 24. Competing DSP architectures with 18x25 bitresolution require multiple blocks, as well as external logic to implement a floating-point mantissa multiplication, resulting in a lower performance and higher powerimplementation.

    Figure 24. Single-Precision Floating-Point Multiplication

    +-

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    CoefficientBank

    +/-

    27 Bits

    27 Bits 27 Bits

    InputRegister

    InputRegister

    X

    27x27Acc

    Reg

    64 X

    27x27Acc

    Reg

    64

    CONFIGURABLE

    27-Bit

    Coeffic

    ient

  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    15/16

    Competitive Summary Page 15

    May 2011 Altera Corporation Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

    Double-Precision Floating-Point Multiplication

    Double-precision mantissa multiplication requires four DSP blocks all cascaded byusing the dedicated 64-bit cascade bus in the DSP block, as shown in Figure 25.

    This technique is an FPGA industry first, because competing architectures requirecascading two 18x25 blocks for single-precision, floating-point mantissamultiplication and up to nine blocks (with extra logic) to implement a 54x54 double-

    precision mantissa multiplier.

    Competitive SummaryWith the introduction of the variable-precision DSP architecture, Altera has opened aDSP technology gap against competing architectures, as summarized in Figure 26.Alteras latest 28-nm devices can natively, and within a single block, implement a27x27 multiplier useful for high-precision, fixed-point DSP, or for emerging floating-point DSP applications. Variable precision means that designers set the DSParchitecture precision to match the algorithm, not the other way around. Also with a64-bit cascade bus and accumulator, designers don't have to forgo precision when thealgorithm implementation requires multiple DSP blocks.

    Figure 25. Floating-Point Modes

    Single-Precision Mantissa Multiplication

    (27x27 Mode)

    Double-Precision Mantissa Multiplication

    (54x54 Mode)

    OR

    +-

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Fee

    dback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    Coefficient

    Bank

    +/-

    +-

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    CoefficientBank

    +/-

    +-

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    CoefficientBank

    +/-

    +-

    +-

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feed

    back

    Reg

    ister

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    CoefficientBank

    +/-

    +

    -

    +

    -

    IntermediateMultiplexer

    64 Bits

    64 Bits

    Feedback

    Multiplexer

    Feedback

    Register

    OutputMultiplexer

    OutputRegister

    74 Bits108 Bits

    25

    27

    25

    X

    27x27

    27 bitsInputRegister

    Coefficient

    Bank

    +/-

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/29/2019 Altera Enabling High-Performance DSP Applications With Arria v or Cyclone v Variable-Precision DSP Blocks

    16/16

    Page 16 Conclusion

    Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks May 2011 Altera Corporation

    ConclusionAltera's variable-precision DSP block allows the designer to tailor the precision on a

    block-by-block basis. For symmetric filters, hard pre-adders in the DSP block reducethe required multiplier count by 50%, thus saving resources and power. The 18x19multipliers accommodate full 18+18 addition, including sign bits. Internal coefficient

    banks enable higher multiplier performance and save logic resources.

    The Arria V and Cyclone V DSP block is optimized for FIR filters, and the feedbackregister allows implementation of two independent serial-filter channels per DSP

    block. The independent multipliers allow operands to be applied directly to themultipliers and allow the multiplier outputs to be observed directly on the DSP blockoutput port. Finally, Altera offers the industry's first floating-point function in anFPGA architecture.

    Further Information Arria V FPGA Family Overview

    http://www.altera.com/products/devices/arria-fpgas/arria-v/overview/arrv-overview.html

    Arria V Device Family Advance Information Briefhttp://www.altera.com/literature/hb/arria-v/av_51001.pdf

    Cyclone V FPGA Family Overview

    http://www.altera.com/products/devices/cyclone-v-fpgas/overview/cyv-overview.html

    Cyclone V Device Family Advance Information Briefhttp://www.altera.com/literature/hb/cyclone-v/cyv_51001.pdf

    Acknowledgements Pat Fasang, Senior Member of Technical Staff, DSP Marketing, Altera Corporation

    Figure 26. Competitive Comparison

    Feature XilinxArria V and

    Cyclone V FPGAs

    Native support for 27x27 multiply mode

    Variable-precision multiplier size:

    27x27 or 18x19 (dual)

    Efficient implementation of floating point

    Coefficient register banks within the

    DSP block

    Efficient 2-stage accumulator

    (feedback register)

    Accumulator size 48 bits 64 bits

    Width of cascade bus 48 bits 64 bits

    Pre-adder support for symmetric filters

    Support for systolic FIR filters

    http://www.altera.com/products/devices/arria-fpgas/arria-v/overview/arrv-overview.htmlhttp://www.altera.com/products/devices/arria-fpgas/arria-v/overview/arrv-overview.htmlhttp://www.altera.com/literature/hb/arria-v/av_51001.pdfhttp://www.altera.com/products/devices/cyclone-v-fpgas/overview/cyv-overview.htmlhttp://www.altera.com/products/devices/cyclone-v-fpgas/overview/cyv-overview.htmlhttp://www.altera.com/literature/hb/cyclone-v/cyv_51001.pdfhttp://www.altera.com/products/devices/arria-fpgas/arria-v/overview/arrv-overview.htmlhttp://www.altera.com/products/devices/cyclone-v-fpgas/overview/cyv-overview.htmlhttp://www.altera.com/literature/hb/cyclone-v/cyv_51001.pdfhttp://www.altera.com/literature/hb/arria-v/av_51001.pdf