Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.

Softcore Vector Processor

Team ASP

Brandon Harris

Arpith Jacob

Outline• Motivation

• Smith-Waterman

• Solution

• System Architecture• Overview• Functional Unit• Instruction Controller• Processing Element• Memory Controller

• ISA

• Results

• Future Research

Motivation• Smith-Waterman sequence alignment

Motivation

•Similar Problems

• HMMer, BLAST, RNA Secondary Structure Prediction

• Smith-Waterman sequence alignment

Our Solution• Softcore Vector Processor

• Massively Parallel

• Software programmable

• Configurable Instantiation

• Why Softcore?

• Optimize for specific applications

• Adapt to changes in algorithms

• FPGA technology improves with time

Architectural Overview• Streaming Architecture

• Memory Mapped FIFOs

• Read Once Data

• Write Once Data

• Provides communication between components

Software DMA

Functional

DMA Software

Functional

Architectural Overview

Software DMA

Functional

DMA Software

Functional

• Streaming Architecture• Memory Mapped FIFOs

• Read Once Data

• Write Once Data

• Provides communication between components

Functional Unit

Instruction

Controller

Instr.

Processing

Element

Processing

Element

Processing

Element

Memory Controller

Shared Local MemoryStream In Stream Out

Processing

Element

Processing

Element

Processing

Element

R0: 0R1: 1R2:R3:R4:R5:

R5R5 1010addi R1addi R1

Instruction Controller• SIMD Instruction Broadcast

addi 10R5 R1

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 2R2:R3:R4:R5:

10 11 12

Processing

Element

Processing

Element

Processing

Element

R2Ld R2 0Ld R3 0R3

• SIMD Instruction Broadcast

R0: 0R1: 0R2: R3: ptr1R4:R5:

R2 0Ld R3

ptr1 ptr1 ptr1

Instruction Controller

Processing

Element

Processing

Element

Processing

Element

R2Ldir IR3R0

• Instruction Register Broadcast• 40% Register Savings

R0: 0R1: 0R2: R3:R4:R5:

ptr1 ptr1 ptr1

R0: 0R1: R2: R3:R4:R5:

Processing

Element

Processing

Element

Processing

Element

• Instruction Register Broadcast• 40% Register Savings

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: R2: R3:R4:R5:

ptr1Ld

Processing Element

Register

Ra Addr Rb Addr

Data Select

Pipeline Register

Compare

Write Enables Data

Ra Data Left

Rb Data Left Rb Data Right

Ra Data Right

ImmediateRa Addr Rb Addr

Wr Enable Left Wr En Right

Memory Controller

Mem Wr Enable

bmseti R17 EQ 16

1 1 1 1 1

Functional Unit

Instruction

Controller

Instr.

Processing

Element

Processing

Element

Processing

Element

Memory Controller

Functional Unit

Instruction

Controller

Instr.

Processing

Element

Processing

Element

Processing

Element

Memory Controller

Ported

IC PE 0-3

Single Cycle Read

Memory Controller

Ported

IC PE 0-3

Multiple Cycle Write

Instruction Set Architecture• Custom ISA

• Two Sets of Instruction Types• Instruction Controller• Processing Element

• Optimized for target applications

• Max, Min, Loop

• Expandable

• Core vs. Application Specific

Sample Code_query_loop:

subir %r8, %r3, %ir10nopnopmax %r4, %r4, %r8add %r3, %r19, PE_ZERO_REG

bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1icaddi %ir15, %ir8, PE_NUM_ELEMENTS - 1nopnopldir PE_MEM_REG, PE_ZERO_REG(%ir15)nopnopnopnopaddi %r3, PE_MEM_REG, 0

ld PE_MEM_REG, PE_ZERO_REG(DB_ADDRESS)icaddi %ir7, %ir7, 1icaddi %ir9, %ir9, 1

icloop %ir4, %ir5, _query_loop

_query_loop:

icaddi %ir15, %ir8, PE_NUM_ELEMENTS - 1subir %r8, %r3, %ir10add %r3, %r19, PE_ZERO_REGldir PE_MEM_REG, PE_ZERO_REG(%ir15)max %r4, %r4, %r8

bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1icaddi %ir7, %ir7, 1icaddi %ir9, %ir9, 1addi %r3, PE_MEM_REG, 0

ld PE_MEM_REG, PE_ZERO_REG(DB_ADDRESS)

icloop %ir4, %ir5, _query_loop

Results• VHDL Implementation

• Simulated• Synthesized

• Smith-Waterman• 16 PE version tested• Millions of Cell Updates Per Second (MCUPS)

Smith-Waterman Speedup

System Freq MCUPS Speedup

P4 1.8 GHz 15 1

SVP16 150 MHz 52 3.47

SVP32 150 MHz 103 6.87

SVP64 125 MHz 167 11.13

SVP128 120 MHz 302 20.13

SVP128 150 MHz 378 25.20

Comparative Performance

System* Freq PEs/Chip MCUPS/PE

Chips MCUPS/Chip

Cost($1000)

MCUPS/$1000

SVP128 150 MHz 128 2.95 1 378 5 75

SVP128 120 MHz 128 2.36 1 302 5 60

SVP64 125 MHz 64 2.61 1 167 5 33

SVP32 150 MHz 32 3.22 1 103 5 20

Kestrel 20 MHz 64 0.78 8 50 25† 16

GeneMatcher2 192 MHz 192 5.21 16 1000 69 14

Fuzion 150 200 MHz 1536 1.63 1 2500 ? ?

* Reference [1]† Estimated

Performance

PEs Freq (MHz) Area BRAM

16 150 13% 22

32 150 22% 38

64 125 41% 70

128 120 80% 134

• Hardware• Xilinx Vertex 4 VLX200

Future Work • Software Development

• How can HMMer and other systolic algorithms be implemented?

• ISA Expansion• What additional instructions are needed?• What instructions can be added to optimize?

• Hardware Development• How can we optimize the hardware to make it

faster and smaller?• What hardware can we add to enhance performance?• How can we take advantage of advances in FPGAs, such as DSP48s?

Acknowledgments • Special Thanks

• Young Cho• Roger Chamberlain• Jeremy Buhler• Joseph Lancaster

• References• Di Blas et al, “The Kestrel Parallel Processor,” IEEE Transactions on Parallel and Distributed Systems, January 2005• A. Jacob et al, “Whole Genome Comparison Using Commodity Workstations,” Technical Report, 2003

Questions?

Team ASP

Brandon Harris

Arpith Jacob

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.

Documents

Transcript of Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.

Rajiv- asp

ASP Electrolytes

Catalogue asp

Intro ASP Disjunction ASP Solvers - Unical · 2008-01-30 · T. Eiter Unit 1 ASP Basics. Intro ASP Disjunction ASP Solvers Roots Negation Strati ed Negation Strati ed Negation Intuition

ASP Questions

JASPER and the SKARAB - California Institute of … · ADC. Hardware: QSFP 40G mezzanine card ... Softcore: Microblaze SKARAB uses a softcore Microblaze Replaces the PowerPC of the

ASP Association of Screen Professionals. ASP Aims ASP Aims – ASP is a non profit association supporting those working in the film and TV industry who.

ASP Sortovi

LISARD: LABVIEW-INTEGRATED SOFTCORE · PDF fileConclusion - The softcore architecture for utilization as a function component in LabVIEW is highly configurable in terms of VLIW-organisation,

An ASP family & Batavia ASP work crew · 2016. 9. 25. · Batavia United Methodist Church 8 N. Batavia Ave Batavia, IL 60510 An ASP family & Batavia ASP work crew Batavia ASP Connect

Classic ASP

Xenomai on NIOS II Softcore Processor Guide-V1.2

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore.

Design and implementation of a multithreaded softcore ...

Pertemuan 3 Server Side Scripting (ASP & ASP .Net)

Using VLIW Softcore Processors for Image Processing ...ce-publications.et.tudelft.nl/publications/1506_using_vliw_softcore... · a softcore VLIW processor based on the ISA of the

DDR3 memory integration for a softcore in a new radiation ...kth.diva-portal.org/smash/get/diva2:1164147/FULLTEXT01.pdf · DDR3 memory integration for a softcore in a new radiation

ASP-Criando Sites Dinamicos Com ASP 3.0-uallacelv.

App Delivery with OpenStack NeutronCLOUD FOUNDRY kubernetes ASP ASP ASP ASP ASP ASP ASP ASP ASP Multi-Tenant Services PLATFORM AS A SERVICE (PAAS) • • • • Follow us on Twitter

For Model Numbers: ASP-238 Instruction Manual ASP … · ASP-238 Instruction Manual ASP-238B 3-in-1 Grillet ...