Download - Extensible Processors. 2 ASIP Gain performance by: Specialized hardware for the whole application (ASIC). − Almost no flexibility. −High cost. Use.


Extensible Processors



• Gain performance by: Specialized hardware for the whole application (ASIC).

− Almost no flexibility.− High cost.

Use special hardware for customized instructions in a GP processor Instruction set extension.

Application-specific instruction set processors, Customized to perform particularly well in a particular

application area. Can improve performance for particular problem instances

while maintaining the flexibility of the overall system.− Motivated by application-specific nature of embedded




• Problems: Substantial non-recurring engineering costs

− Each new ASIP must be verified both from the functionality and timing perspectives.

− A new mask set must be created to fabricate the chip. − Software side: the compiler must be retargeted to each new

processor− Any hand-written libraries must be migrated to the new

platform. Automation of some of these tasks may be possible;

− however, the majority of this work is still a manual process.

Difficult to adopt a new ASIP despite the potential




• Advantages: System is post-programmable and can tolerate modest

changes to the application (little performance degradation)

− e.g., changes in standard. Computation intensive portions of applications from the

same domain (e.g., encryption) are often similar in structure.

− Customized instructions can often be generalized in small ways to make them more useful across a set of applications.

− Lowers the cost than ASIC.




Xtensa Processor

• Xtensa from Tensilica [Gonzales00] A processor core which lets the system designer:

− select and size features for a given application,− define new instructions.

Designer can use standard ASIC design flow and tools to synthesize the processor.

− Xtensa is fully synthesizable. Tensilica processor generator adds the application-

specific functionality at the time the hardware is designed.

− Extensions are implemented in the same logic family as the rest of the processor.

− Cannot modify the extensions for other applications.


Xtensa Processor

• Designer specifies the characteristics in TIE (Tensilica Instruction Extension)

language and/or menus.− Number of physical registers,− Instruction cache size,− Data cache size,− Data RAM size,− External bus width,− Number of interrupts,− Extended instructions (functional units).

• Tools generate synthesizable RTL code for the processor, generate software development tools:

− ANSI C//C++ compiler,− Linker,− Assembler,− Code profiler,− Instruction set simulator.


Xtensa Processor

Designer can analyze and identify bottlenecks in application performance.

Can work around the bottlenecks. Can add instructions.


Xtensa Example

• Example: DES algorithm


Xtensa Example

• Characteristics: Extensive bit permutations:

− inefficient in software− efficient in hardware: simple renaming of wires

Rotation on 28-bit boundaries:− in software: rotation instruction on 32-bit boundaries

Table look-ups

• Added 4 instructions


Xtensa Speed-Up for DES


Xtensa Speed-Ups

• Speed-ups for some applications


Altera Nios, Xilinx MicroBlaze

• Soft extensible processor Can define custom instructions Can configure the processor Uses Altera FPGA resources

− Lower performance− Higher power consumption


Extensible Processors

• Major problems with ASIPs: Not flexible

− For a new application: new masks, other NRE costs. Large human effort required to identify and implement

an efficient set of instruction set extensions.

• Major problem with soft processors: Low performance

• Solution: A GP processor with reconfigurable FU.


Extensible Processors

• Custom Instruction (CI): Instructions in the extended Instruction Set

Architecture (ISA) Can be implemented in the processor's datapath itself

or as a separate co-processor.− Usually in the processor datapath.

A fragment of the program's dataflow graph mapped onto a hardware Custom Functional Unit (CFU).

• Basic block: A code fragment with single entry and exit points. Load/Store cannot be in the BB

− Cannot predict after how many clocks, the results are available to next instructions


Custom Instructions Limitations

1. Number of Operands: Imposed by base architecture of the core processor.

− Length of a custom instruction increases with increasing number of operands.

− Number of input and output ports to the register file the number of input and output operands

− cost and energy consumption of a processor increase significantly with increasing number of register file ports.

2. Number of custom instructions: Imposed by the format of the base ISA.

− If base ISA supports 26 instructions with fixed-length opcode 6 more CIs.


Custom Instructions Limitations

3. Area Important especially in embedded systems.

4. Control Flow: Custom instruction identification is typically performed

within basic block boundaries.− Assumption: compiler cannot exploit instructions that

cross basic block boundaries.


Instruction Set Extension (ISE)

• Automatic ISA extension generation consists of: Custom Instruction Identification

− Identifies patterns meeting certain topology requirements

Custom Instruction Selection− Selects the most important patterns under resource

and other constraints.


Automatic ISE

• To mimic the choices of an expert designer

• New concept of “Compiler”: Retargetable compiler:

− Maintaining a single piece of code for compiling to different machine targets:

− Reads underlying machine description, then produces code for it.

More automation:− Tuning the machine’s instruction set:− Compiler: defines the machine and then produces code

for it.




[Gonzalez00] R. Gonzalez, “Xtensa: a configurable and extensible processpr,” IEEE Micro, 2000.