Configurable Soft Processor Arrays Using the OpenFire Processor
description
Transcript of Configurable Soft Processor Arrays Using the OpenFire Processor
Craven 1 B212/MAPLD 2005
Configurable Soft Processor Arrays Using the OpenFire
Processor
Stephen CravenCameron Patterson
Peter Athanas
Configurable Computing LabVirginia Tech
Craven 2 B212/MAPLD 2005
Outline
• Motivation• Single Chip Multi-Processors • Application-Specific Instruction set Processors
• OpenFire Processor• Features and Configurability• Performance
• Configurable Array Example: Median Image Filtering• Optimizations• Performance Comparisons
Craven 3 B212/MAPLD 2005
Motivation: SCMP• Moving towards Single Chip Multi-Processors (SCMP)
because:• Underutilized silicon budget• Diminishing ROI on Instruction Level Parallelism • Design and verification too costly • SCMPs more energy efficient• SCMPs can leverage existing IP• SCMPs by nature are easily scalable• Fast, on-chip inter-processor communication • SCMP is fashionable (Cell, Pentium D, Athlon x2)
• Hard and soft processors in Xilinx and Altera FPGAs
Craven 4 B212/MAPLD 2005
Motivation: ASIP• Application-Specific Instruction set Processors (ASIP)
allow:• Optimum match of instruction set to application• Performance benefits approaching ASICs while retaining
programmability• Architectural features customized to application
• Datapath width sizing• Memory and cache hierarchy tuning
• Available commercially through Tensilica• Complete design flows and generated custom toolsets• $$$
• Academic/Research use through ASIPMeister• Closed source• GUI Only
Craven 5 B212/MAPLD 2005
Motivation: Configurable Arrays
• Merging SCMP with ASIP combines benefits of both:• Reduced design time utilizing existing IP• Programmability of SCMP with performance improvements
of ASIP
• FPGAs ideal platform for configurable array research and implementation• Rapid prototyping• Mature tool chains• Xilinx and Altera offer devices
with embedded processing cores (PPC and ARM)
Craven 6 B212/MAPLD 2005
OpenFire• Configurable 32-bit RISC processor
• Specialized for processor arrays• Instructions based on Xilinx MicroBlaze
• Uses MicroBlaze tool chain (mb-gcc, XPS, etc.)• Can execute subset of MicroBlaze code without modification• All MicroBlaze instructions supported except for division, barrel
shifting, and status register and cache related instructions• Not burdened by features unused in arrays (interrupts,
exceptions, caches, interfaces)• Open source
• Released under MIT license• Support utilities provided (C simulator, BRAM loaders, etc.)
• Differs from previously available MicroBlaze clone aeMB:• Works correctly and extensively documented
Craven 7 B212/MAPLD 2005
Performance• Cycle accurate with MicroBlaze except for:
• Multiply has 5 cycle latency (3 for MicroBlaze)• Single cycle instruction fetches (2 cycles for MicroBlaze)
• 100 MHz on a Xilinx Virtex II-Pro 30 speed grade 6OpenFire 641 slices 58.47 DMIPSMicroBlaze 734 slices 58.98 DMIPS*
• Performance variable depending on configuration:• 16-bit datapath implementation reduces area to 402 slices,
speed increases to 106 MHz
* Minimal MicroBlaze implementation (no OPB, division unit, barrel shifter, or cache) at 100 MHz
Craven 8 B212/MAPLD 2005
Extensibility• Additional instructions, including multicycle operations, can be
easily added inside ALU without affecting critical path• Potential for at least 10 new 2-operand instructions in
instruction space
RegisterFile
32x32
Mult*
Add
Bit Fns
PC
Imm
ALU
Compare
MSB
PC
Data Mem
Craven 9 B212/MAPLD 2005
Extensibility• OpenFire datapath customizable from 32-bits
downwards• Instructions are constant 32-bits wide• Custom datapath widths limit program size
• Program Counter is treated same as any data word• 8-bit datapath => 64 instruction program• 16-bit datapath => 16,384 instruction program
• Planned extensions include:• Increasing number of Fast Simplex Link (FSL) bus I/Os • Fast ALU-to-FSL and FSL-to-ALU operations• Additional debugging capabilities
Craven 10 B212/MAPLD 2005
Case Study: Image Filtering• 3x3 Median Image Filter written in C• Soft Processor Arrays created
• Master node – MicroBlaze with DDR SDRAM• Slave nodes – OpenFires connected in ring
network with master
Craven 11 B212/MAPLD 2005
Array Creation Process• Automated flow for array creation
• Edit DEFINE.V to set processor parameters• Create C code for master MicroBlaze and slave OpenFires
• Verification of C code available through XMD simulator and simple OpenFire C simulator
• Makefile-based flow automatically:• Creates ring network of desired size• Compiles programs and initializes BRAMs• Runs the EDK tool flow to generate a bitstream
• FSL debugging bus on the OpenFire provides observablity to the processor during operation
Craven 12 B212/MAPLD 2005
Array Results• Slave processor area reduced 45% by downsizing datapath to
16-bits• Required only slight modifications to original C code• Allows more OpenFires on chip, increasing throughput
• Near-linear speedup with increasing array size
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
Number of OpenFires
Sp
eed
up
Craven 13 B212/MAPLD 2005
Future Directions
• Research goal: Automated flow for creating optimized heterogeneous arrays of soft processors• Input – Parallel HLL description of application• Optimizations: datapath sizing, instruction removal /
addition, dual-issue processor cores, alu-to-network & network-to-alu operations, microcode controller, full datapath implementations
• Optimization objective: Maximize array throughput by
• Increasing individual node throughput• Reducing area to add additional nodes
Craven 14 B212/MAPLD 2005
Conclusion• Configurable soft processor arrays offer the best of
SCMPs and ASIPs• Simplified design• Improved performance
• OpenFire processor designed for use in processor arrays• Excellent performance / area• Highly configurable
• Datapath width adjustment can produce noticeable performance improvement
Craven 15 B212/MAPLD 2005
References• OpenFire source code and utilities:
http://www.ccm.ece.vt.edu/~scraven/
• James-Roxby, P., Schumacher, P., and Ross, C. “A Single Program Multiple Data Parallel Processing Platform for FPGAs,” FCCM’04