Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview...

8
DNVUPF4A Godzilla’s Scholarly, but Evil, Toe Fungus (GSBETF) FPGA-based DSP Acceleration Engine Xilinx Quad Virtex UltraScale+ FPGAs | 5G Prototyping 1 DINI group Product Brief Ver. 1.10 October 2018 Features • Four Xilinx Virtex UltraScale+ FPGAs (B2104): - Virtex UltraScale+: VU13P, VU9P, VU7P, VU5P - Virtex UltraScale: VU190, VU160, VU125 • 80+ million ASIC gates (ASIC measure) with VU13P - 47,616, 27x18 multipliers across 4 FPGAs • Hosted via - 4-lane PCIe via iPASS cable, USB2.0 - 10/100/1000BASE-T Ethernet, or stand alone • Memory can be added using DINAR2_SODM204 using 3 DNBC expansion connectors - DDR4: DNBC3_DDR4 - DNSODM204_SSRAM1_8V - DNSODM204_QUADMIC (four Mictor connectors) - DNSODM204_SE (mobile SDRAM) - DNSODM204_USB (USB2.0 PHY) - DNSODM204_DDR2_FAST - DNSODM204_QDRII+ - DNSODM204_DDR2_2GB - DNSODM204_MICTOR_IO (dual Mictor connectors) • High Speed interfaces on each FPGA: - QSFP28 module for 4x10 GbE or single 40/100 GbE - 2 ARF6 connectors • DNTC (DINI Transceiver Connector), 32-lanes each. One per FPGA. Each capable of supporting: - 16-lane PCIe (GEN1/GEN2/GEN3) - 4x CX4 – Ethernet, XAUI, Infiniband - 32x SFP+ modules for 10 GbE - 8x QSFP28 modules for 40/100 GbE - 32x USB3.0/2.0 (A,AB,B) - 32x Serial ATA II (SATA II) - 32x channels cabled via SMA TMB busses – Preconfigured high speed data movement between field FPGAs and Config FPGA - 5 GB/s DMA between FPGAs and Config FPGA • A↔B, A↔C, A↔D, B↔C, B↔D,C↔D • FPGA[A,B,C,D] ↔ Config FPGA (Marvell uP) • Main Bus (YMB) for bussed interconnect between all four FPGAs - 18 signals, single-ended • Marvel MV78200 Discovery Innovation Dual CPU (socketed) - 1 GHz clock - Dual USB2.0 ports (Type B connector) - Dual Serial-ATA II connectors for 2 external hard drives (SATA II) - Gigabit Ethernet interface • 10/100/1000 GbE (RJ45 connector) - Sheeva™ CPU Core (ARM v5TE compliant) • Out-of-order execution • Single and double-precision IEEE compliant floating point • 16-bit Thumb instruction set increases code density • DSP instructions boosts performance for signal processing applications • MMU to support virtual memory features • Dual Cache: 32 KB for data and instruction, parity protected • L2 cache: 512 KB unified L2 cache per CPU (total of 1MB), ECC protected. - 1 GB external DDR2 SDRAM • Organized in a 128M x 64 configuration • 400 MHz (800 MHz data rate with DDR) - RS232 port for terminal-style observation - After configuration, both CPUs dedicated entirely to user application - Linux operating system • Source and examples provided via GPL license (no charge) • ~15 seconds to CPU boot • Five independent low-skew global clock networks and single fixed clock - Five, high-resolution, user-programmable synthesizers for G0-G4 • Silicon Labs Si5326: 2kHz to 945 MHz - User configurable via Marvell uP RS232, USB, PCIe, or Ethernet - Global clocks networks distributed differentially and balanced • Flexible customization and stacking via 3 daughter card connectors per FPGA - DNBC (DINI Bank Connector) expansion connector • One bank per connector - Daughters cards (1 to 12 connectors <banks>) - Added FPGA to FPGA interconnect (inter or intra board) • Connector: non-proprietary; readily available; cheap • 24 LVDS pairs + 4 single-ended, + clocks - 52 single-ended - 800MHz on all signals with source synchronous LVDS - Signal voltage set by daughter card (+1.2V to +1.8V) - Reset - Supplied power rails (fused): • +12V (24W max), +3.3V (10W max) - Pin multiplexing to/from daughter cards using LVDS (up to 10x) • Noninvasive debug via FPGA register readback: DN_Readbacker • Fast and Painless FPGA configuration - USB, cabled PCIe, Ethernet, JTAG - Stand-alone configuration with USB stick - Configuration Error reporting - Accelerated configuration readback for advanced debug • RS232 port for embedded FPGA-based SOC uP debug - Accessible from all FPGAs via separate 2-signal bus • Full support for embedded logic analyzers via JTAG interface - Vivado Logic Analyzer and other third party solutions. - ProtoLink™ debug connection to any/all FPGAs with DNBC adapter card: DNBC_ProtoLink • Status FPGA-controlled LEDs: - Enough multicolored LED’s to melt cheese

Transcript of Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview...

Page 1: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4AGodzilla’s Scholarly, but Evil, Toe Fungus (GSBETF)

FPGA-based DSP Acceleration EngineXilinx Quad Virtex UltraScale+ FPGAs | 5G Prototyping

1 DINI group

Product Brief

Ver. 1.10October 2018

Features• Four Xilinx Virtex UltraScale+ FPGAs (B2104): - Virtex UltraScale+: VU13P, VU9P, VU7P, VU5P - Virtex UltraScale: VU190, VU160, VU125• 80+ million ASIC gates (ASIC measure) with VU13P - 47,616, 27x18 multipliers across 4 FPGAs• Hosted via - 4-lane PCIe via iPASS cable, USB2.0 - 10/100/1000BASE-T Ethernet, or stand alone• Memory can be added using DINAR2_SODM204 using 3 DNBC expansion connectors - DDR4: DNBC3_DDR4 - DNSODM204_SSRAM1_8V - DNSODM204_QUADMIC (four Mictor connectors) - DNSODM204_SE (mobile SDRAM) - DNSODM204_USB (USB2.0 PHY) - DNSODM204_DDR2_FAST - DNSODM204_QDRII+ - DNSODM204_DDR2_2GB - DNSODM204_MICTOR_IO (dual Mictor connectors)• High Speed interfaces on each FPGA: - QSFP28 module for 4x10 GbE or single 40/100 GbE - 2 ARF6 connectors• DNTC (DINI Transceiver Connector), 32-lanes each. One per FPGA. Each capable of supporting: - 16-lane PCIe (GEN1/GEN2/GEN3) - 4x CX4 – Ethernet, XAUI, Infiniband - 32x SFP+ modules for 10 GbE - 8x QSFP28 modules for 40/100 GbE - 32x USB3.0/2.0 (A,AB,B) - 32x Serial ATA II (SATA II) - 32x channels cabled via SMA• TMB busses – Preconfigured high speed data movement between field FPGAs and Config FPGA - 5 GB/s DMA between FPGAs and Config FPGA • A↔B, A↔C, A↔D, B↔C, B↔D,C↔D • FPGA[A,B,C,D] ↔ Config FPGA (Marvell uP)• Main Bus (YMB) for bussed interconnect between all four FPGAs - 18 signals, single-ended• Marvel MV78200 Discovery Innovation Dual CPU (socketed) - 1 GHz clock - Dual USB2.0 ports (Type B connector) - Dual Serial-ATA II connectors for 2 external hard drives (SATA II) - Gigabit Ethernet interface • 10/100/1000 GbE (RJ45 connector) - Sheeva™ CPU Core (ARM v5TE compliant) • Out-of-order execution • Single and double-precision IEEE compliant floating point • 16-bit Thumb instruction set increases code density

• DSP instructions boosts performance for signal processing applications • MMU to support virtual memory features • Dual Cache: 32 KB for data and instruction, parity protected • L2 cache: 512 KB unified L2 cache per CPU (total of 1MB), ECC protected. - 1 GB external DDR2 SDRAM • Organized in a 128M x 64 configuration • 400 MHz (800 MHz data rate with DDR) - RS232 port for terminal-style observation - After configuration, both CPUs dedicated entirely to user application - Linux operating system • Source and examples provided via GPL license (no charge) • ~15 seconds to CPU boot• Five independent low-skew global clock networks and single fixed clock - Five, high-resolution, user-programmable synthesizers for G0-G4 • Silicon Labs Si5326: 2kHz to 945 MHz - User configurable via Marvell uP RS232, USB, PCIe, or Ethernet - Global clocks networks distributed differentially and balanced• Flexible customization and stacking via 3 daughter card connectors per FPGA - DNBC (DINI Bank Connector) expansion connector • One bank per connector - Daughters cards (1 to 12 connectors <banks>) - Added FPGA to FPGA interconnect (inter or intra board) • Connector: non-proprietary; readily available; cheap • 24 LVDS pairs + 4 single-ended, + clocks - 52 single-ended - 800MHz on all signals with source synchronous LVDS - Signal voltage set by daughter card (+1.2V to +1.8V) - Reset - Supplied power rails (fused): • +12V (24W max), +3.3V (10W max) - Pin multiplexing to/from daughter cards using LVDS (up to 10x)• Noninvasive debug via FPGA register readback: DN_Readbacker• Fast and Painless FPGA configuration - USB, cabled PCIe, Ethernet, JTAG - Stand-alone configuration with USB stick - Configuration Error reporting - Accelerated configuration readback for advanced debug• RS232 port for embedded FPGA-based SOC uP debug - Accessible from all FPGAs via separate 2-signal bus• Full support for embedded logic analyzers via JTAG interface - Vivado Logic Analyzer and other third party solutions. - ProtoLink™ debug connection to any/all FPGAs with DNBC adapter card: DNBC_ProtoLink• Status FPGA-controlled LEDs: - Enough multicolored LED’s to melt cheese

Page 2: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

2DINI group

Description

Overview

The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a vehicle to accelerate algorithms in FPGAs. The DNVUPF4A is a stand-alone system and can be hosted by a 4-lane PCIe cable (GEN2), USB or Ethernet. A single DNVUPF4A configured with four Virtex UltraScale+, V13Ps can emulate up to 80 million gates of logic as measured by a reasonable ASIC gate counting standard. An infinite number of DNVUPF4As can be linked together extending this gate count number 1 billion or more seamlessly. The gate count estimate number does not include embedded memories and multipliers resident in the FPGA fabric. One hundred percent (100%) of the Virtex UltraScale+ FPGA resources are available to the user application. The DNVUPF4A achieves high gate density and allows for fast target clock frequencies by utilizing FPGAs from Xilinx’s 16nm Virtex UltraScale+ family.

Max(100% util)

(1000's)

Practical(60% util)(1000's)

BlocksUltraRAM(4k x 72bits)

Block RAM(18kbits)

Total(kbits)

Total (kbytes)

VU13P -1,-2,-3 3,456,000 33,178 19,910 12,228 1,280 5,376 465,408 58,176VU9P -1,-2,-3 2,364,480 22,699 13,620 6,840 960 4,320 354,240 44,280VU7P -1,-2,-3 1,576,320 15,133 9,080 4,560 640 2,880 236,160 29,520VU5P -1,-2,-3 1,201,154 11,531 6,920 3,474 470 2,048 172,224 21,528

Max(100% util)

(1000's)

Practical(60% util)(1000's)

Blocks(18kbits)

Total(kbits)

Total (kbytes)

VU190 -1,-2,-3 2,148,480 20,625 12,380 1,800 7,560 136,080 17,010VU160 -1,-2,-3 1,852,800 17,787 10,670 1,560 6,552 117,936 14,742VU125 -1,-2,-3 1,432,320 13,750 8,250 1,200 5,040 90,720 11,340

Virtex

Virtex

Memory

UltraScaleSpeedGrades

(slowest to fastest)

FF'sGate Estimate

Mul

tiplie

rs(2

7x18

) Memory

UltraScale+Speed Grades

(slowest to fastest)

FF'sGate Estimate

Mul

tiplie

rs

(27x

18)

Page 3: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

3 DINI group

Block Diagram

8

8

4

4

GEN3 PCIe (x8)ipass cabled

GEN 1PCIe

GEN 3PCIe

GEN 1PCIe

USB ‘A’

64 128M x 64DDR2

128MbSPI

Boot FLASH

3

4GbNAND FLASH

Boot

8

SATA II(host)

2x

RGMII

RTC

SATA

DMA(4x)

10/100/1000Phy

10/100/1000baseT

RJ45

25MHz

RS232

RS232 toUSB

RS232

Marvell MV78200

CPU

FPU FPU

CPU

USBUSB ‘B’

LCD Header

MV SerialUser JTAG

JTAG

TempSensor

GEN PCIeSwitch

(PEX 8724)I2C

Global Clocks

SMA G0

01

23

4

G1G2

G3G4

Feedbackfrom stacking

stacking

FreqSynth

24MHz

Si5326

FPGA A

FPGA B

FPGA C

SMA

SMA

FPGA D

Config FPGA

114.285 MHz

YMB

GTY

GTY

GTY

GTY

YMB

TMB TMB

TMB TMB

GTY

GTY

GTY

GTY

DNTMB (Out)

DNTMB (In)

Out In

4

4

Clocks

EEPROM

SPI Flash

5

5

StackingConfigFPGAKU085(A1517)

FPGA AVirtex Ultrascale+

VU13P, VU7P, VU5PVirtex UltrascaleVU190,VU160,VU125

VU095,VU080Kintex Ultrascale

KU115, KU095(B2104)

FPGA BVirtex Ultrascale+

VU13P, VU7P, VU5PVirtex UltrascaleVU190,VU160,VU125

VU095,VU080Kintex Ultrascale

KU115, KU095(B2104)

FPGA DVirtex Ultrascale+

VU13P, VU7P, VU5PVirtex UltrascaleVU190,VU160,VU125

VU095,VU080Kintex Ultrascale

KU115, KU095(B2104)

FPGA CVirtex Ultrascale+

VU13P, VU7P, VU5PVirtex UltrascaleVU190,VU160,VU125

VU095,VU080Kintex Ultrascale

KU115, KU095(B2104)

24

24

24

24

24

24

24

24

24

24

24

24

7272

48 48

4848

7272

72

72

72

72

32 GTYExpansion

4

2

QSFP28

ARF6

32GTYExpansion

4

2

QSFP28

ARF6

24

24

24

24

24

24

24

24

24

24

24

24

32 GTYExpansion

4

2

QSFP28

ARF6

32GTYExpansion

4

2

QSFP28

ARF6

4

4 4

12 12

12

12

18

4

DNVUPF4A

Block Diagram v1.10Godzilla’s Scholarly, but Evil, Toe Fungus

Page 4: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

4DINI group

Stacking multiple boards together

An infinite number of DNVUPF4As can be ganged together to increase the resources. This page here has more detail: ‘Stacking Multiple DNVUFAs boards together’. All functionality is seamlessly maintained including the high performance data movement via the TMB (Transceiver Main Bus). Interconnect between FPGAs on a single board and between boards in a stack can be configured on a bank-by-bank basis via cables on the DNBC connectors. Clocks, resets, and configuration are handled seamlessly.

Virtex UltraScale+ FPGA from Xilinx

The DNVUPF4A uses a high I/O-count, 2104-pin flip-chip BGA package. In this package the VP13P has 702 I/Os and 76 GTY channels (25 Gb/s). Most are utilized. Abundant fixed interconnects (either differential or single-ended) are provided between the FPGAs. FPGA to FPGA interconnect is routed as LVDS, but can be used single-ended at a reduced frequency. GTY connections between FPGAs allows for high bandwidth data movement FPGA to FPGA. 100% of the internal resources of the four Virtex UltraScale+ FPGAs is dedicated to the user application.

Introducing the Xilinx Virtex UltraScale+ V13P. When stuffed with four of these devices, the DNVUPF4A is capable of prototyping >80 million gates of ASIC logic with plenty of resource margin. Most interesting is the 11,904, 27x18 multipliers per FPGA, for a total of more than 47k for the board. This large number of multipliers combined with 58 MB of internal memory per FPGA enables algorithmic acceleration that cannot be achieved with alternate approaches.

The Marvell MV78200 Discovery™ Dual CPUA MONSTER for data movement and manipulation

Easy FPGA configuration is a required feature of large FPGA boards. We use a custom socketed CPU card to handle this function. We choose a Marvell MV78200 from the Discovery™ Innovation CPU family. Bluntly stated, this CPU is massive, massive overkill for the mundane task of FPGA configuration. The MV78200 comes with a variety high performance interfaces, and all can be utilized to your advantage. Look forward to a higher performance CPU card in the near future.

Dual Sheeva™ CPUs, 1GHz with floating point

First and foremost are dual CPUs. And after we are done configuring the FPGAs we dedicate both CPUs to your application. The CPUs in the MV78200 are Marvell Sheeva™ cores, which are ARM v5TE compliant. The CPUs are clocked at 1GHz and each processor has a single and double precision floating point unit. A fixed 1 GB, DDR2 memory is standard and is useful for large amounts of high speed data buffering. The memory is organized as 128M x 64 and clocked at the full frequency allowed: 400MHz (800MHz effective with DDR). This DDR2 bank is shared between the two CPUs. Boot code is resident in an SPI Flash, and application code is downloaded via any port: PCIe, USB, and Ethernet. We ship Linux as the standard operating system. Options exist for VxWorks and other real-time operating systems. Contact the factory for more information.

Page 5: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

5 DINI group

PCI Express

The Marvell 78200 acts as a two-port high-speed PCI Express switch (2.5 Gb/s). It connects the user FPGA at 4-lane PCI Express speeds to a host computer. The Marvell 78200 has multiple DMA engines to pump data to and from any port. The user interface on the FPGA is a simple-to-use, pipelined A/D bus running at 6.4Gb/s. Drivers for data movement to and from a host machine are provided. A simple example FPGA design and host computer application streaming data at PCI Express x4 bandwidth to the user FPGA is provided.

Two Serial-ATA Ports (SATA II)

The MV78200 has two Serial-ATA Generation 2 (SATA II) ports, each capable of running at 3.0 Gb/s. SATA is intended for high speed data transfer to/from serial-ATA hard drives. Two SATA connectors are provided, allowing for direct, high-speed interfacing to external hard drives. The MV78200 has specialized enhanced DMA (EDMA) engines for HDD data transfer with 512-byte buffer for each channel. Examples of all possible data movement options, with source, are included.

GbE - 802.3 Gigabit Ethernet

The MV78200 can be controlled over its built-in Ethernet port. The interface is a standard RJ45 connector. This port can be used to configure FPGAs, set board clocks and other resources, and access the Linux terminal. This terminal can also be used to send data to and from the user FPGA design at gigabit Ethernet speeds.

Bank-Granular Expansion connectors for customization, memory, and stacking

The DNVUPF4A uses a connector standard called DNBC (DINI Bank Connector), which utilizes a Samtec SEAM series connectors. Three of these connectors are attached to each field FPGA (A, B, C, and D), enabling expansion, customization, and stacking. This is a non-proprietary, industry standard connector from Samtec and the mating connector is readily available. We can provide the mating connector to you at our cost. We are not fans of proprietary, hard-to-get, outrageously priced expansion connectors. Of the 52 signals in the bank, 24 pairs are routed differentially and can run at the limit of the Virtex UltraScale FPGA I/Os: 800 MHz. The remaining 4 signals are routed single-ended. Clocks, resets, and cable/daughter card presence detection, along with abundant (fused) power are included in each connector.

Memory

Memory can be added to the DNVUPF4A via the DNBC expansion connector using the DNBC_SODM204 expansion card. DDR4 memory is added with this card: DNBC3_DDR4. For other options, three DNBC connectors can host a single DINAR2_SODM204 expansion card, so as many as four of these cards can be used on a single DNVUPF4A. The DINAR2_SODM204 has a 204-pin SODIMM socket. Off-the-shelf DDR3 SODIMM modules work fine, allowing you to add up to 8GB of low cost memory in each position. In addition, we have compatible SODIMMs in the following variations: flash, SSRAM, QDR II+, mobile SDRAM, mictors, USB2/3 PHYs, and more.

Page 6: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

6DINI group

Easy Configuration via PCIe, USB, or Ethernet

Configuration of the FPGAs is under the control of the Marvell CPU. Configuration data can be provided over PCI Express, USB, Ethernet, or on-board non-volatile memory. The configuration files can be copied to the board using a USB memory stick (provided). Configuration occurs automatically after the CPU boots. Sanity checks are performed automatically on the configuration files, streamlining the configuration process in the case of human error. Multiple LEDs provide instant status and operational feedback.

Status LEDs, Debug

As with all of our FPGA-based products boards, the DNVUPF4A is loaded with LEDs. The LEDs are stuffed in several different colors (red, green, blue, orange et al.). There are enough LEDs here to melt cheese. Please don’t melt cheese without adult supervision. These LEDs are user controllable from the FPGAs so can be used as visual feedback in addition to the gratifying task of creating gooey messes. A JTAG connector provides an interface to Vivado Integrated Logic Analyzer (ILA) and other third party debug tools such as those from Exostiv. A DNBC daughter card enables a ProtoLink™ interface.

Page 7: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

7 DINI group

Photos

Page 8: Product Brief DNVUPF4A October 2018 · F4 Fased cceleration ngine DINI group 2 Description Overview The DNVUPF4A is a logic acceleration system that enables ASIC or IP designers a

DNVUPF4A FPGA-based DSP Acceleration Engine

8DINI group

For technical applications and sales support, call 858.454.3419

7469 Draper Ave.La Jolla, CA 92037-5026

Phone: 858.454.3419Fax: 858.454.1728

E-Mail: [email protected]: http://www.dinigroup.com

The DINI Group reserves the right to make changes to the product(s) or information contained herein without notice. No liability is assumed as a result of their use or application. No rights under any patent accompany the sale of any such product(s) or information.

Photos