IEEE floating point multiplier verilog

19
Rev. B Last updated 8/7/01 QuickDSP: Combining Embedded DSP Blocks, Performance, Density, and Embedded RAM QMAC Blocks Up to 18 Embedded Computational Units, ECUTM Integrated multiply, add, accumulate functions 8-bit multiplier, 16-bit adder with carry CLOCK NETWORK 9 global clock networks 1 dedicated, 8 programmable 16 I/O (high drive) networks: 2 banks per I/O 20 Quad-net networks: 5 per quadrant PROGRAMMABLE I/O High performance enhanced I/O: less than 3 ns Tco Programmable slew rate control Programmable I/O standards LVTTL, LVCMOS, PCI, GTL+, SSTL2, and SSTL3 8 independent I/O banks 3 register configuration: Input, Output, OE Free parameterized IP administered with a DSP Wizard Supports multiple and hierarchical IP instantiations APPLICATIONS Signal processing operators Signal processing functions Networking / communications for VoIP Speech / voice processing Channel coding 36 blocks of dual-port SRAM 2,304 bit dual port high performance SRAM Blocks Total of 82,900 bits RAM / ROM / FIFO Wizard for automatic configuration Configurable and cascadable Array sizes of 2, 4, 9, and 18 < 3 ns access times, 300+ MHz FIFO 0.25u, 5 layer metal CMOS process 2.5 V Vcc, 2.5 / 3.3 V drive capable I/O 512 programmable I/O 4,032 super cells 660,000 max system gates Muxed based architecture, non-volatile technology Completely customizable for any digital applications FIGURE 1. QL7180 Block Diagram Features Parameterized IP Dual Port SRAM High Speed Customizable Logic QuickDSP TM Family Data Sheet

description

IEEE floating point multiplier verilog

Transcript of IEEE floating point multiplier verilog

Page 1: IEEE floating point multiplier verilog

Rev. BLast updated 8/7/01

QuickDSP: Combining Embedded DSP Blocks, Performance, Density, and Embedded RAM

QMAC Blocks■ Up to 18 Embedded Computational Units, ECUTM

■ Integrated multiply, add, accumulate functions

■ 8-bit multiplier, 16-bit adder with carry

CLOCK NETWORK■ 9 global clock networks

■ 1 dedicated, 8 programmable

■ 16 I/O (high drive) networks: 2 banks per I/O

■ 20 Quad-net networks: 5 per quadrant

PROGRAMMABLE I/O■ High performance enhanced I/O: less than 3 ns Tco

■ Programmable slew rate control

■ Programmable I/O standards

■ LVTTL, LVCMOS, PCI, GTL+, SSTL2, and SSTL3

■ 8 independent I/O banks

■ 3 register configuration: Input, Output, OE

■ Free parameterized IP administered with a DSP Wizard

■ Supports multiple and hierarchical IP instantiations

APPLICATIONS■ Signal processing operators

■ Signal processing functions

■ Networking / communications for VoIP

■ Speech / voice processing

■ Channel coding

■ 36 blocks of dual-port SRAM

■ 2,304 bit dual port high performance SRAM Blocks

■ Total of 82,900 bits

■ RAM / ROM / FIFO Wizard for automatic configuration

■ Configurable and cascadable

■ Array sizes of 2, 4, 9, and 18

■ < 3 ns access times, 300+ MHz FIFO

■ 0.25u, 5 layer metal CMOS process

■ 2.5 V Vcc, 2.5 / 3.3 V drive capable I/O

■ 512 programmable I/O

■ 4,032 super cells

■ 660,000 max system gates

■ Muxed based architecture, non-volatile technology

■ Completely customizable for any digital applications

FIGURE 1. QL7180 Block Diagram

Features

Parameterized IP

Dual Port SRAM

High Speed Customizable Logic

QuickDSPTM Family Data Sheet

Page 2: IEEE floating point multiplier verilog

2 Preliminary2

QuickDSPTM

*ECU-Embedded Computational Unit

TABLE 1. QuickDSP Embedded Standard Product Family Members

QuickDSP Embedded C Computational Unit

Traditional Programmable Logic architectures do not implement arithmetic functions efficiently or effectively. These functions require high logic cell usage while garnering only moderate performance results. By embedding a dynamically reconfigureable computational unit, the QuickDSP family can address various arithmetic functions efficiently and effectively providing for a robust DSP platform.This approach offers greater performance than traditional programmable logic implementations. The ECU block is ideal for complex DSP, filtering and algorithmic functions. The QuickDSP architecture will allow functionality above and beyond that achievable using DSP processors or programmable logic devices. The embedded block is implemented at the transistor level with the following block diagram.

FIGURE 2. ECU Block Diagram

To implement the equivalent ECU block as HDL in a programmable logic architecture it would require 205 Logic Cells with a 10 ns delay in a -4-speed grade. There are 18 ECU blocks in the largest device and 10 on the smallest. The ECU blocks are placed next to

QL7100 QL7120 QL7160 QL7180

ECU* 10 12 16 18

Max Gates 292,160 373,440 558,464 662,208

Logic Array 40x24 48x32 64x48 72x56

Logic Cells 960 1,536 3,072 4,032

Max Flip-Flops 2,688 4,302 7,488 9,600

Max I/O 256 320 448 512

RAM Modules 20 24 32 36

RAM bits 46,100 55,300 73,700 82,900

PackagesPQFP 208 208

BGA (1.27mm) 516 516 516 516

BGA (1.0mm) 484 484 484, 672 484, 672

FPBGA (0.8mm) 280 280 280 280

QuickDSP Embedded Computational Unit

Logic CellMemory

Multiply

Sequencer

Add Register

AbusXbus

Ybus

I busSign

Rbus

168

8

3

1

17

Page 3: IEEE floating point multiplier verilog

3

QuickDSPTM

the SRAM circuitry for efficient memory/instruction fetch and addressing for DSP algorithmic implementations. Eighteen 8-bit MAC functions can be implemented per cycle. Additional multiply-accumulate function can be implemented in the programmable logic.

TABLE 2. ECU Comparisons

The modes for the ECU block are dynamically re-programmable through the instruction set sequencer.

*B [15:0] set to zero

TABLE 3. ECU Mode Select Criteria

Function Description Slowest speed grade

Fastest speed grade

Adder 16 bit 8 ns 2.5 ns

32 bit 10 ns 5.6 ns

64 bit 12 ns 6.7 ns

Multiplier 8 x 8 10 ns 4.3 ns

16 x 16 12ns 6.7 ns

System Clock 200 MHz 400 MHz

Instruction Operation

0 0 0 Multiply

0 0 1 Multiply-Add

0 1 0 Accumulate

0 1 1 Add

1 0 0 Multiply (registered)*

1 0 1 Multiply- Add (registered)

1 1 0 Multiple - Accumulate

1 1 1 Add (registered)

ECU Mode Select

Page 4: IEEE floating point multiplier verilog

4 Preliminary4

QuickDSPTM

INTEGRATED DSP WIZARD

The QuickLogic DSP Wizard will manage QuickLogic's DSP IP portfolio and will enable users to combine multiple instantiations of similar or different IP on a single QuickLogic device. This IP will be available free of cost. All parameterized IP will have an associated GUI interface allowing an easy, step-by-step, and intuitive process for multiple IP implementations.

To protect IPs during simulation, QuickLogic will provide compiled simulation models (pre-layout / post-layout) for each IP for targeted (and popular) simulators. Through a strategic partnership with Amphion (formerly known as ISS), QuickLogic will be delivering the compiled models for Silos III (for Verilog) and ModelSim (for VHDL) by default. Compiled models for other simulators including ModelSim and Synopsys will be provided as and when the need arises.

FIGURE 3. DSP Wizard

QuickDSP Software

Step 1:Select IP(s), add, configure

Step 2:Check output files, click ‘Finish’

Step 3:Add user-logic,connect ports

Step 4:Repeat(if necessary)

Page 5: IEEE floating point multiplier verilog

5

QuickDSPTM

INTEGRATED QUICKFILTER

The QuickFILTER software provides a graphical UI, that allows the filter designer to choose between implementation of different FIR filters types as LP (Low pass), HP (High pass), BP (Band pass) and BS (Band stop) filters. The filter coefficients are then saved as a QuickLogic compatible .ROM file (hex format). Features supported for designing FIR filters are as follows:

■ Specification (entering) of sampling frequency (0-150MHz), or designing the filter to a normalized sampling frequency.

■ Specify bit width of filter coefficients as Real or 8, 9, 10, 11, 12, 13, 14, 15, 16, 24 and 32-bit wide.

■ Graphical only specification of filer demand, such as cut off frequencies and stop band attenuation etc.

■ Automatic calculation or manual specification of filer order.

■ Save filter coefficients in QuickLogic ROM compatible file (Hex), “.rom” file.

FIGURE 4. QuickFilter Control Panel - Designer

FIGURE 5. QuickFilter Control Panel - Change Options

Features

The QuickFILTER software provides a graphical UI, that allows the filter designer to choose between implementation of different FIR filters types as LP (Low pass), HP (High pass), BP (Band pass) and BS (Band stop) filters. The filter coefficients are then saved as a QuickLogic compatible .ROM file (hex format).

Features supported for designing FIR filters are as follows,

■ Specification (entering) of sampling frequency (0-150MHz), or designing the filter to a normalized sampling frequency.

■ Specify bit width of filter coefficients as Real or 8, 9, 10, 11, 12, 13, 14, 15, 16, 24 and 32-bit wide.

■ Graphical only specification of filer demand, such as cut off frequencies and stop band attenuation etc.

■ Automatic calculation or manual specification of filer order.

■ Save filter coefficients in QuickLogic ROM compatible file (Hex), “.rom” file.

Page 6: IEEE floating point multiplier verilog

6 Preliminary6

QuickDSPTM

Filter coefficient/FFT Analysis options

After entering sampling frequency, coefficient bit-width etc., QuickFILTER creates several analysis features as explained below:

Filter/FFT Analysis features■ Estimate S/N ratio of filter

■ Quantizise filter coefficients

Viewing features,■ Display/Plot Amplitude Spectrum

■ Display/Plot Phase relationship

■ Display/Plot Group delay

■ Display/Plot Time response

■ Zoom-In, Out, All and Select

■ View filter coefficients

■ View filter information

Analysis Options,■ View in dB.

■ Log X-axis

■ Curve plot

■ FFT-options (64, 128, 256, 512 & 1024)

FIGURE 6. Filter/FFT Analysis

Page 7: IEEE floating point multiplier verilog

7

QuickDSPTM

SIGNAL PROCESSING (DSP FUNCTION) CORES

Modifiable Coefficient, Parameterizable 8-bit & 12-bit FIR filters: High or low performance 2-tap and 8-tap adaptable and cascadable FIR filters with modifiable coefficients. The data word lengths are 8-bit and 12-bit for both data and coefficient to the multiplier.

BiQuad IIR Filter: High performance second order IIR filter with two's complement input and output. 8-bit input/output word length and 8-bit coefficient word length. Overflow detection and saturation with on-the-fly coefficients allowing various coefficient-loading schemes to be incorporated.

64-point FFT / IFFT: The Amphion Semiconductor high performance 64-point FFT performs forward or inverse Fast Fourier Transforms on complex data containing 64 points. The 64-point FFT/IFFT is based on the radix-4 decimation in frequency (DIF) algorithm. It performs the computation concurrently in three highly pipelined cascaded stages.

256-point FFT / IFFT: The Amphion Semiconductor high performance 256-point FFT, performs forward Fast Fourier Transforms on complex data containing 256 points. The 256-point FFT core computes the FFT using the radix-4 decimation in frequency (DIF) algorithm. It performs the computation concurrently in four highly pipelined cascaded stages. It is capable of processing continuous data stream.

1024-point FFT / IFFT: The Amphion Semiconductor high performance 1024-point FFT, performs forward or inverse Fast Fourier Transforms on complex data containing 1024 points. The 1024-point FFT core is based on a radix-4 / radix-16 algorithm and performs 1024-point FFT / IFFT computation in four computation passes.

IP Portfolio (as of 2Q01)

PerformanceUtilization

Logic Cells ECU’s

8-tap 8-bit 84 193 8

8-tap 12-bit 49 1,482 8

8-tap 8-bit 67 218 2

8-tap 12-bit 45 536 2

8-tap 8-bit 68 190 1

8-tap 12-bit 48 298 1

10-tap 8-bit 77 281 10

10-tap 12-bit 45 1,866 10

10-tap 8-bit 76 285 10

10-tap 12-bit 48 1,868 10

PerformanceUtilization

Logic Cells ECU’s

24 MHz 115 18

PerformanceUtilization

Logic Cells ECU’s

62 MHz 2,024 18

PerformanceUtilization

Logic Cells ECU’s

43 MHz 2,947 18

PerformanceUtilization

Logic Cells ECU’s

42 MHz 1,633 16

Page 8: IEEE floating point multiplier verilog

8 Preliminary8

QuickDSPTM

SIGNAL PROCESSING (DSP OPERATORS) CORES

Floating Point Operators: The input word length is IEEE 754 single precision and a 32-bit sign magnitude output.

Converters: The input word length is IEEE 754 single precision and a 32-bit sign magnitude output.

Fixed Point Operators: High performance fixed-point unsigned integer divider and fractional square root operator with modified radix-2 array.

SPEECH / VOICE PROCESSING CORES

Multichannel ADPCM: Supports 32 duplex channels and up to 64 simplex channels, each of which is independently selectable for encoding or decoding, and is fully compliant with ITU G.726, G.726a, G.727 and G.727a standards. Both burst and interleaved encoding / decoding with on-the-fly selection of initialization and configuration. Duplex channel organization (half the channels for encode and half for decode) and flexible channel organization (encode and decode in any combination).

PCM Codec: µ-Law / A-Law PCM codec has µ−Law or A-Law codec operation, even bit inversion (EBI) option for Α-Law, and conforms fully to CCITT G.711 Standard.

NETWORKING / COMMUNICATIONS CORES

HDLC: High performance module for the bit oriented packet transmission mode. It is suitable for Frame-Relay, X.25, ISDN B-Channel (64 KBit/s) and D-Channel (16 KBit/s). The core fulfills the specification according to ITU Q.921, X.25 Level 2 recommendation.

Sign 8-bit Exp (2’s complement

23-bit mantissa

Operator Perfor-mance

Utilization

Logic Cells ECU’s

Subtractor/ Adder

34 MHz 465 0

Multiplier 34 MHz 836 0

Divider 34 MHz 1,337 18

Operator Perfor-mance

Utilization

Logic Cells ECU’s

Signed to Floating

46 MHz 127 0

Signed to Floating

46 MHz 177 0

Operator Perfor-mance

Utilization

Logic Cells ECU’s

Multiplier 100 MHz 215 14

Divider 37 MHz 1,023 0

Square Root 47 MHz 528 0

PerformanceUtilization

Logic Cells ECU’s

17 MHz 1,496 18

PerformanceUtilization

Logic Cells ECU’s

39 MHz 73 3

PerformanceUtilization

Logic Cells ECU’s

150 MHz 18 0

Page 9: IEEE floating point multiplier verilog

9

QuickDSPTM

CHANNEL CODING CORES

Reed-Solomon Decoder: The Amphion Reed-Solomon decoder cores provide compact high performance solutions for a wide range of applications. n = 204, k=188, 8 bit symbols, configurable solution for high data rate Reed Solomon decoding. Supports a range of standards including European Telecommunication Standards ETS 300-421 and ETS 300-429. Single implementation supports any valid block length and can process both burst and continuous data. Supports high speed applications (up to 400Mbps), symbol wide input and output, clocked by single symbol rate clock, and simple core interface allows easy integration into larger systems.

Reed-Solomon Encoder: The Amphion Reed-Solomon encoder cores provide compact high performance solutions for a wide range of applications. n = 204, k=188, 8 bit symbols, configurable solution for high data rate Reed Solomon encoding. Supports a range of standards including European Telecommunication standards ETS 300-421 and ETS 300-429. Single implementation supports any valid block length and can process both burst and continuous data. Symbol wide input and output, clocked by single symbol rate clock, low latency implementation - 2 symbol clock cycles, and simple core interface allows easy integration into larger systems.

Convolutional Encoder: High performance implementation suitable for a range of Forward Error Correction applications. It may be used in conjunction with other FEC related cores available from Amphion to rapidly construct complete FEC solutions. This core supports Viterbi Mode Rates up to 7/8, Block and Continuous Mode operation, Viterbi Mode Complies with INTELSAT IESS-308 and INTELSAT IESS-309.

Viterbi Decoder: Viterbi Decoder with k=7, Mother rate=1/2, Generator polynomials: G1=171(octal) and G2=133(octal). Support for external depunctured code rates; capable of supporting externally depunctured rates of up to 7/8. 4 bit I and Q input (1 bit hard plus 3 bits soft) with provision for the collection of performance statistics - programmable threshold to flag 'out-of-sync' condition and readable register providing a BER estimate. A “force-through-zero” capability: that is, a mechanism for forcing all survivor paths through the zero state at arbitrary times. Traceback length of 170 states, linear branch metrics, even-bit-burst processing only (Bursts must contain an even number of input symbol-pairs). Error correcting performance (coding gain) ›› 5.0dB at 1E-5 BER.

PerformanceUtilization

Logic Cells ECU’s

47 MHz 1,427 0

PerformanceUtilization

Logic Cells ECU’s

51 MHz 245 0

PerformanceUtilization

Logic Cells ECU’s

100 MHz 282 0

PerformanceUtilization

Logic Cells ECU’s

36 MHz 3,714 0

Page 10: IEEE floating point multiplier verilog

10 Preliminary10

QuickDSPTM

The turnkey QuickWorks package provides the most complete ESP and FPGA software solution from design entry to logic synthesis, to place and route to simulation. The packages provide a solution for designers who use third party tools from Cadence, Mentor, OrCAD, Synopsys, Viewlogic, Veribest and other third-party tools for design entry, synthesis, or simulation.

QuickDSP is fabricated on a .25u 5 layer metal CMOS process. The core voltage is 2.5 volt Vcc supply and 3.3 tolerant I/O with the addition of 3.3 volt VCCIO. QuickDSP is available in commercial, industrial, and military temperature grades.

The QuickDSP features an enhanced Supercell with an additional D flip-flop register and associated control logic. This advanced architectural approach, addresses today's highly register intensive designs. In addition to the 2.6 Billion MACs/s operations by utilizing the ECU, additional multiply accumulate functions can also be implemented in the programmable logic for an additional 2 Billion MACs/s when clocked at 100 MHz.

FIGURE 7. Performance Standards

QuickWorks Design Software

Process Data

Function Description Slowest speed grade

Fastest speed grade

Multiplexer 16:1 5 ns 2.8 ns

Parity Tree 24 6 ns 3.4 ns

36 6 ns 3.4 ns

Counter 16 bit 250 MHz 450 MHz

32 bit 250 MHz 450 MHz

FIFO 128 x 32 155 MHz 280 MHz

256 x 16 155 MHz 280 MHz

128 x 64 155 MHz 280 MHz

Clock to out 4.5 ns 2.5 ns

System clock

200 MHz 400 MHz

Programmable Logic Architectural Overview

Page 11: IEEE floating point multiplier verilog

11

QuickDSPTM

The QuickDSP logic Supercell structure, Figure 8, is similar to the .35 mm QuickLogic logic cell with the addition of a second register. Both registers share CLK, SET and RESET inputs. The second register has a two-to-one multiplexer controlling its input. The register can be loaded from the NZ output or directly from a dedicated input. NOTE: The input “PP” is not an “input” in the classical sense. It can only be tied high or low using default links only and is used to select which path “NZ” or “PS” is used as an input to the register. All other inputs can be connected not only to “tiehi” and “tielo” but to multiple routing channels as well.

The complete logic cell consists of two 6-input AND gates, four two-input AND gates, seven two-to-one multiplexers and two D flip-flop with asynchronous SET and RESET controls. The cell has a fan-in of 30 (including register control lines) and fits a wide range of functions with up to 17 simultaneous inputs. It has 6 outputs, 4 combinatorial and 2 registered. The high logic capacity and fan-in of the logic cell accommodate many user functions with a single level of logic delay while other architectures require two or more levels of delay.

FIGURE 8. QuickDSP Supercell

The QuickDSP Family includes multiple dual-port 2,304-bit RAM modules for implementing RAM, ROM and FIFO functions. Each module is user-configurable into four different block organizations. Modules can also be cascaded horizontally to increase their effective width or vertically to increase their effective depth as shown in Figure 9. The RAM can also be configured as a modified Harvard Architecture, similar to those found in DSPs.

FIGURE 9. 2,304 bit QuickRAM module

The number of RAM modules varies from 12 to 36 blocks within the QuickDSP family, for a total of 46.1K to 82.9k bits of RAM. Using two “mode” pins, designers can configure each module into 128 x 18 (Mode 0), 256 x 9 (Mode 1), 512 x 4 (Mode 2), or 1024 x 2 blocks (Mode 3). The blocks are also easily cascadable to increase their effec]tive width and/or depth. See Figure 10.

FIGURE 10. Cascaded RAM Modules

QS A1 A2 A3 A4 A5 A6 OS OP B1 B2 C1 C2 MS D1 E1 NP E2 D2

NS F1 F3 F5 F6

F2 F4

PS PP

MP

AZ

OZ

QZ

NZ

FZ

Q2Z

QC QR

RAM Modules

MODE[1:0]

WA[9:0]

WD[17:0]

WE

WCLK

2,304-bit Module

ASYNCRD

RA[9:0]

RD[17:0]

RE

RCLK

WDATA

RDATA

RDATA

WADDR

WDATA

RADDR

RAM Module

(2,304 bits)

RAM Module

(2,304 bits)

Page 12: IEEE floating point multiplier verilog

12 Preliminary12

QuickDSPTM

The RAM modules are dual-port, with completely independent READ and WRITE ports and separate READ and WRITE clocks. The READ ports support asynchronous and synchronous operation, while the WRITE ports support synchronous operation. Each port has 18 data lines and 10 address lines, allowing word lengths of up to 18 bits and address spaces of up to 1024 words. Depending on the mode selected, however, some higher order data or address lines may not be used.

The Write Enable (WE) line acts as a clock enable for synchronous write operation. The Read Enable (RE) acts as a clock enable for synchronous READ operation (ASYNCRD input low), or as a flow-through enable for asynchronous READ operation (ASYNCRD input high).

Designers can cascade multiple RAM modules to increase the depth or width allowed in single modules by connecting corresponding address lines together and dividing the words between modules.

A similar technique can be used to create depths greater than 512 words. In this case address signals higher than the ninth bit are encoded onto the write enable (WE) input for WRITE operations. The READ data outputs are multiplexed together using encoded higher READ address bits for the multiplexer SELECT signals.

The RAM blocks can be loaded with data generated internally (typically for RAM or FIFO functions) or with data from an external PROM (typically for ROM functions). The RAM achieve 155 MHz performance for the lowest speed grade devices when using multiple blocks cascaded together.

The extremely fast RAM can be used in designs that require multiple memory accessing. The RAM achieves 280 MHz performance for the fastest speed grade and 155 MHz performance for the lowest speed grade devices when using multiple blocks cascaded together. Write through of DATA is also possible with the QuickLogic RAM.

Multiple Accessing of Memories

Page 13: IEEE floating point multiplier verilog

13

QuickDSPTM

QuickDSP features a variety of distinct I/O pins to maximize performance, functionality, and flexibility with bi-directional I/O pins and input-only pins. All input and I/O pins are 2.5V and 3.3V tolerant and comply with the specific I/O standard selected. The outputs swing from Vss to VCCIO (0V to 3.3V ± 10%). The VCCIO pins must be tied to a 3.3V supply to provide 3.3V compliance. If 3.3V compliance is not required, then these pins must be tied to the 2.5V supply. Table 4 summarizes the I/O specifications that will be supported.

As designs become more complex and requirements more stringent, varying I/O standards are developing for specific applications. I/O standards for processors, memories and various bus applications have become common place and a requirement for

many systems. In addition, I/O timing has become a greater issue with specific requirements for setup, hold, clock to out, and switching times.

The QuickDSP family has addressed these changing system requirements. The QuickDSP family includes a completely new I/O cell which consists of programmable I/Os as well as a new cell structure consisting of 3 registers - input, output and output enable. QuickDSP will offer banks of programmable I/O that addresses many of the new bus standards that are popular today. In addition, the input register addresses the setup time; the output register addresses clock-to-out time; and the OE register addresses the switching time from high impedance to a given value.

TABLE 4. I/O Standards and Applications

I/O Cell Structure

I/O Standard Reference Voltage Input Swing Output Voltage Application

LVTTL n/a 0 - 3.3 3.3 General Purpose

LVCMOS2 n/a 0 - 2.5 2.5 General Purpose

PCI n/a 0 - 3.3 3.3 PCI Bus Applications

GTL+ 1 0.8 - 1.2 n/a High Speed Bus

SSTL3 1.5 1. 0 - 1.43 3.3 Memory Bus

SSTL2 1.25 1.3 - 1.7 2.5 Memory Bus

Page 14: IEEE floating point multiplier verilog

14 Preliminary14

QuickDSPTM

FIGURE 11. QuickDSP I/O Cell

The bi-directional I/O pin options can be programmed for input, output, or bi-directional operation. As shown in Figure 11, each bi-directional I/O pin is associated with an I/O cell which features an input/feedback register, an input buffer, output/feedback register, three-state output buffer, an output enable register, and (2) two-to-one multiplexers.

For input functions, I/O pins can provide combinatorial, registered data or both options simultaneously to the logic array. For combinatorial input operation, data is routed from I/O pins through the input buffer to the array logic. For registered input operation, I/O pins drive the D input of input cell registers, allowing data to be captured with fast set-up times without consuming internal logic cell resources.

For output functions, I/O pins can receive combinatorial or registered data from the logic array. For combinatorial output operation, data is routed from the logic array through a multiplexer to the I/O pin. For registered output operation, the array logic drives the D input of the output cell register which in turn drives the I/O pin through a multiplexer. The multiplexer allows either a combinatorial or a registered signal to be driven to the I/O pin.

The three-state output buffer controls the flow of data from the array logic to the I/O pin and allows the I/O pin to act as an input and/or output. The buffer's output enable can be individually controlled by a logic

cell array or any pin (through the regular routing resources), or bank-controlled through one of the global networks. The signal can be also be either combinatorial or registered. This is identical to that of the flow for the output cell. For combinatorial control operation data is routed from the logic array through a multiplexer to the three-state control. For registered control operation, the array logic drives the D input of the OE cell register which in turn drives the three-state control through a multiplexer. The multiplexer allows either a combinatorial or a registered signal to be driven to the three-state control. For output functions, I/O pins can be individually configured for active HIGH, active LOW, or open-drain inverting operation. In the active HIGH and active LOW modes, the pins of all devices are fully 3.3V compliant.

When I/O pins are unused, the OE controls can be permanently disabled, allowing the output cell register to be used for registered feedback into the logic array. I/O cell registers are controlled by clock, clock enable, and reset signals, which can come from the regular routing resources, from one of the global networks, or from two input pins per bank of I/O's. The CLK and RESET signals share a common line, while the clock enables for each register can be independently controlled. Additionally the output and enable registers will increase a device's register count. The addition of an output register will also decrease the Tco. Since the output register does not need to drive the routing, a TSINV plus one inversion can be cut from the output path and the length of the path is also reduced.

Extra registers add more inputs and outputs to the I/O structure. Extra routing resources are added to connect the I/O structure to the other parts of the device.

I/O interface support is programmable on a per bank basis. There are 8 I/O banks per chip. Users can not mix 2.5v I/O with 3.3v I/O on the same I/O bank. Figure 12 illustrates the I/O bank configurations.

Each I/O bank is independent of other I/O bank and each I/O bank has it's own VCCIO and VREF supplies. A mixture of different I/O standards can be used on the device, however there is a limitation as to which I/O standards can be supported within a given bank. Differential I/O can be shared with non differential I/O. There can only be one VREF and one VCCIO per bank.

DE

Q R

QD R

Vref

PADIN

OUT

Q E D

R OE

Page 15: IEEE floating point multiplier verilog

15

QuickDSPTM

FIGURE 12. Multiple I/O Banks

Programmable Slew Rate

Each I/O has programmable slew rate capability. The rate is programmable to one of two slew rates either fast or slow. The slower rate can be used to reduce ground bounce noise. The slow slew rate is 1 V/ns under typical conditions. The fast slew rate will be 2.8 V/ns.

Condition: 2.5V, 25C

TABLE 5. Programmable Slew Rate

Programable weak pull-Down

Programmable weak-pull down resistor is available on each I/O. I/O Weak Pull-Down eliminates the need for external pull down resistor for used I/O. The spec for pull-down current is maximum of 150uA under worst case condition. -148uA @ 3.6V, -55C, -69 uA@ 2.5V, 25C.

FIGURE 13. I/O Weak Pull-Down

I/O Bank 6 I/O Bank 5

I/O Bank 2I/O Bank 1

I/O B

ank

7 I/O

Ban

k 8

I/O B

ank

4 I/O

Ban

k 3

VCCIO 6 VCCIO 5

VCCIO 1 VCCIO 2

VCCIO 7

VCCIO 8

VCCIO 4

VCCIO 3

VREF 6 VREF 5

VREF 1 VREF 2

VREF 7

VREF 8

VREF 4

VREF 3

VCCIO = 3.3V Fast Slew Slow Slew

Rising Edge 2.8 V/nS 1.0 V/nS

Falling Edge 2.86 V/nS 1.0 V/nS

VCCIO = 2.5V Fast Slew Slow Slew

Rising Edge 1.7 V/nS 0.6 V/nS

Falling Edge 1.9 V/nS 0.6 V/nS

Page 16: IEEE floating point multiplier verilog

16 Preliminary16

QuickDSPTM

Global Clocks

There are 8 global clock networks in the QuickDSP device family. Global clocks can drive logic cell, I/O, ECU blocks and RAM registers in the device. Five global clocks will have access to a Quad Net (local clock network) connection with a programmable connection to the register inputs.

FIGURE 14. Global clock methodology

Quad-Net NETWORK

There are 5 Quad-Net local clock networks in each quadrant for a total of 20 in a device. Each Quad-Net is local to a quadrant. Quad-Net is multiplexed with the clock buffer before driving the column clock buffers.

Dedicated Clock

There is one dedicated clock in the QuickDSP device family. It connects to the clock input of the SuperCell, I/O and RAM registers through a hardwired connection and is multiplexed with the programmable clock input. There are four inversions from pad to register inputs and the dedicated clock takes on the same configuration as the global clock. The dedicated clock provides a fast global network

with low skew. You have the ability to select either the dedicated clock or the programmable clock, Figure 15. The performance of the dedicated clock is given in Table 6.

FIGURE 15. Dedicated clock circuitry within logic cell

TABLE 6. Dedicated Clock Performance

I/O Control and Local Hi-Drives.

Each bank of I/O's has 2 input only pins that can be programmed to drive the RST, CLK and EN inputs of I/O's in that bank. These input only pins also double up as high drive inputs to a quadrant. Both as an I/O control or high drive, these buffers can be driven by the internal logic. The performance is indicated in Table 7.

TABLE 7. I/O Control Network/Local High-Drive

Clock Networks

Clock Performance

TT, 25C, 2.5V Global Dedicated

Macro (near) 1.51 1.59

I/O (far) 2.06 1.73

Skew 0.55 0.14

TT, 25C, 2.5V From Pad From Array

I/O (slow) 1.00ns 1.14ns

I/O (fast 0.63ns 0.78ns

Skew 0.37ns 0.36ns

CLK Programmable clock

Hard-wired clock

Page 17: IEEE floating point multiplier verilog

17

QuickDSPTM

Six types of routing resources are provided, as in the QuickRAM devices: short (sometimes called segmented) wires, dual wires, quad wires, express wires, distributed networks and defaults. Short wires span the length of 1 logic cell, always in the vertical direction. Dual wires run horizontally and span the length of 2 logic cells. Short and dual wires are predominantly used for local connections. They effectively traverse one or two logic cells utilize an interconnect element to continue to the next cell or to change direction.

Quad wires have interconnect elements every fourth logic cell. As a result, these wires are typically used to implement intermediate length or medium fan-out nets.

Express lines run the length of the programmable logic uninterrupted. Each of these lines has a higher capacitance than a quad, dual or short wire, but less capacitance than shorter wires connected to run the length of the device. The resistance will also be lower because the express wires don't require the use of “pass” links. Express wires provide higher performance for long routes or high fan-out nets.

Distributed networks are described in the clock/control section. These wires span the programmable logic, and are driven by “column clock” buffers. Each dedicated clock network pin buffer is hard wired to a set of column clock buffers. Five global networks “global buffers” can be connected through special purpose routing called “HSCK lines” to either a dedicated pin buffer, or any vertical routing wire crossing it.

The QuickDSP family of devices features a global power-on reset. This reset will be hardwired to all registers and will reset the registers upon power-up of the device. The circuitry used to support the global POR is similar to the power-up loading circuitry.

FIGURE 16. Power-On Reset

Separate power and logic-cell power

To decrease the logic cell area and to eliminate the need for disable transistors in the input stage of the logic cell, a separate power supply for the logic cells has been added to the family. This supply will be grounded during programming and for various test modes.

Programmable Logic Routing Global POR (power-on reset)

VCC

Power-on Reset

Q XXXXXXX 0

Page 18: IEEE floating point multiplier verilog

18 Preliminary18

QuickDSPTM

The Pinnacle family of devices supports IEEE standard 1149.1a. The following public instructions are supported: BYPASS, EXTEST, and SAMPLE/PRELOAD. Two additional modes RAMWT and RAMRD can be used to load the RAM. The pin functions will be the same as in the QuickRAM family. Additional modes will be added during the design phase to accommodate PREI/O, clock and various test circuitry.

JTAG BSDL Support■ BSDL-Boundary Scan Description Language

■ Machine-readable data for test equipment to generate testing vectors and software

■ BSDL files available for all device/ package combinations from QuickLogic

■ Extensive industry support available and ATG (Automatic Test-vector Generation)

8-bit programming

The QuickDSP device family will have 8-bit programming capability. The addition of four extra programming supplies will be used in the reduction of programming time.

Security fuses

There are two security links, one to disable reading the array, the other to disable JTAG.

Flexibility fuse

The flexibility link is actually implemented as two “default” links. If the tie-low link is programmed, RAM power up loading (from an external EPROM) is enabled, which might affect JTAG. If the tie-hi link is programmed, RAM power-up loading (from an external EPROM) will be disabled. JTAG will work normally, and can also be used to load the RAM.

FIGURE 17. JTAG Block Diagram

IEEE Standard 1149.1a

Page 19: IEEE floating point multiplier verilog

19

QuickDSPTM