Configurable Soft Processor Arrays Using the OpenFire Processor

Craven 1 B212/MAPLD 2005

Configurable Soft Processor Arrays Using the OpenFire

Processor

Stephen CravenCameron Patterson

Peter Athanas

Configurable Computing LabVirginia Tech


Outline

• Motivation• Single Chip Multi-Processors • Application-Specific Instruction set Processors

• OpenFire Processor• Features and Configurability• Performance

• Configurable Array Example: Median Image Filtering• Optimizations• Performance Comparisons


Motivation: SCMP• Moving towards Single Chip Multi-Processors (SCMP)

because:• Underutilized silicon budget• Diminishing ROI on Instruction Level Parallelism • Design and verification too costly • SCMPs more energy efficient• SCMPs can leverage existing IP• SCMPs by nature are easily scalable• Fast, on-chip inter-processor communication • SCMP is fashionable (Cell, Pentium D, Athlon x2)

• Hard and soft processors in Xilinx and Altera FPGAs


Motivation: ASIP• Application-Specific Instruction set Processors (ASIP)

allow:• Optimum match of instruction set to application• Performance benefits approaching ASICs while retaining

programmability• Architectural features customized to application

• Datapath width sizing• Memory and cache hierarchy tuning

• Available commercially through Tensilica• Complete design flows and generated custom toolsets• $$$

• Academic/Research use through ASIPMeister• Closed source• GUI Only


Motivation: Configurable Arrays

• Merging SCMP with ASIP combines benefits of both:• Reduced design time utilizing existing IP• Programmability of SCMP with performance improvements

of ASIP

• FPGAs ideal platform for configurable array research and implementation• Rapid prototyping• Mature tool chains• Xilinx and Altera offer devices

with embedded processing cores (PPC and ARM)


OpenFire• Configurable 32-bit RISC processor

• Specialized for processor arrays• Instructions based on Xilinx MicroBlaze

• Uses MicroBlaze tool chain (mb-gcc, XPS, etc.)• Can execute subset of MicroBlaze code without modification• All MicroBlaze instructions supported except for division, barrel

shifting, and status register and cache related instructions• Not burdened by features unused in arrays (interrupts,

exceptions, caches, interfaces)• Open source

• Released under MIT license• Support utilities provided (C simulator, BRAM loaders, etc.)

• Differs from previously available MicroBlaze clone aeMB:• Works correctly and extensively documented


Performance• Cycle accurate with MicroBlaze except for:

• Multiply has 5 cycle latency (3 for MicroBlaze)• Single cycle instruction fetches (2 cycles for MicroBlaze)

• 100 MHz on a Xilinx Virtex II-Pro 30 speed grade 6OpenFire 641 slices 58.47 DMIPSMicroBlaze 734 slices 58.98 DMIPS*

• Performance variable depending on configuration:• 16-bit datapath implementation reduces area to 402 slices,

speed increases to 106 MHz

* Minimal MicroBlaze implementation (no OPB, division unit, barrel shifter, or cache) at 100 MHz


Extensibility• Additional instructions, including multicycle operations, can be

easily added inside ALU without affecting critical path• Potential for at least 10 new 2-operand instructions in

instruction space

RegisterFile

32x32

Mult*

Add

Bit Fns

PC

Imm

ALU

Compare

MSB

PC

Data Mem


Extensibility• OpenFire datapath customizable from 32-bits

downwards• Instructions are constant 32-bits wide• Custom datapath widths limit program size

• Program Counter is treated same as any data word• 8-bit datapath => 64 instruction program• 16-bit datapath => 16,384 instruction program

• Planned extensions include:• Increasing number of Fast Simplex Link (FSL) bus I/Os • Fast ALU-to-FSL and FSL-to-ALU operations• Additional debugging capabilities


Case Study: Image Filtering• 3x3 Median Image Filter written in C• Soft Processor Arrays created

• Master node – MicroBlaze with DDR SDRAM• Slave nodes – OpenFires connected in ring

network with master


Array Creation Process• Automated flow for array creation

• Edit DEFINE.V to set processor parameters• Create C code for master MicroBlaze and slave OpenFires

• Verification of C code available through XMD simulator and simple OpenFire C simulator

• Makefile-based flow automatically:• Creates ring network of desired size• Compiles programs and initializes BRAMs• Runs the EDK tool flow to generate a bitstream

• FSL debugging bus on the OpenFire provides observablity to the processor during operation


Array Results• Slave processor area reduced 45% by downsizing datapath to

16-bits• Required only slight modifications to original C code• Allows more OpenFires on chip, increasing throughput

• Near-linear speedup with increasing array size

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

Number of OpenFires

Sp

eed

up


Future Directions

• Research goal: Automated flow for creating optimized heterogeneous arrays of soft processors• Input – Parallel HLL description of application• Optimizations: datapath sizing, instruction removal /

addition, dual-issue processor cores, alu-to-network & network-to-alu operations, microcode controller, full datapath implementations

• Optimization objective: Maximize array throughput by

• Increasing individual node throughput• Reducing area to add additional nodes


Conclusion• Configurable soft processor arrays offer the best of

SCMPs and ASIPs• Simplified design• Improved performance

• OpenFire processor designed for use in processor arrays• Excellent performance / area• Highly configurable

• Datapath width adjustment can produce noticeable performance improvement


References• OpenFire source code and utilities:

http://www.ccm.ece.vt.edu/~scraven/

• James-Roxby, P., Schumacher, P., and Ross, C. “A Single Program Multiple Data Parallel Processing Platform for FPGAs,” FCCM’04

Configurable Soft Processor Arrays Using the OpenFire Processor

Documents

Transcript of Configurable Soft Processor Arrays Using the OpenFire Processor