Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
description
Transcript of Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
![Page 1: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/1.jpg)
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
Tor Aamodt and Paul Chow
University of Toronto
{ aamodt, pc }@eecg.utoronto.ca
3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA
![Page 2: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/2.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 2 of 32
What is this presentation about?
FOCUS: Signal processing applications developed using high-level language representation and floating-point data types...
WANT: Faster fixed-point software development...
QUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance?
![Page 3: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/3.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 3 of 32
Presentation Outline
Motivation & Background
Focus on… Automatic Conversion to Fixed-Point
Architectural Enhancements
Some Experimental Results
Summary / Future Directions
![Page 4: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/4.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 4 of 32
Motivation
80% of DSPs in use are Fixed-Point. Why?
Because fixed-point hardware is cheaper and uses less power …
… however, it is much harder to develop signal-processing software for.
![Page 5: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/5.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 5 of 32
Background
UTDSP Project: DSP Compiler/Architecture Co-design Traditional DSP architectures are hard for compilers to generate
efficient code for… eg. extended precision accumulators First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm
CMOS / 63 MHz (Sean Peng’s M.A.Sc.) 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction
fetching architecture (reduced pin-count)
June 2000: Synopsys CoCentric Fixed-Point Designer Tool First commercial tool for transforming floating-point ANSI C
programs into fixed-point ($20,000 US)
![Page 6: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/6.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 6 of 32
Background: Fixed-Point versus Floating-Point
Fractional PartInteger Partsign bit
sign bit 8 bit exponent (excess 127)
23+1 bit normalizedmantissa
Fixed-Point:
32 bit Floating-Point (IEEE):
implied binary-point
explicitbinary-point
![Page 7: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/7.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 7 of 32
Background: Using Fixed-Point Arithmetic
yn = yn-1 + xn
yn = (( •yn-1>>3) + xn ) << 1
Floating-Point:
Fixed-Point:
Explicit Scaling Operations
![Page 8: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/8.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 8 of 32
Automatic Conversion Process
Input
ProgramParser Optimizer Code Generator Processor
Traditional Optimizing Compiler:
• CONSTRAINT: Input/Output Invariance
• GOAL: Application Speedup
ie. make code faster, but do not break anything!!!
![Page 9: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/9.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 9 of 32
Automatic Conversion Process
Input
ProgramParser Optimizer Code Generator Processor
Floating-Point to Fixed-Point Translator
• “RELAX” CONSTRAINTS…
• GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500 faster than FP emulation)
Traditional Optimizing Compiler:
SampleInputs
![Page 10: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/10.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 10 of 32
Floating-Point to Fixed-Point Translation
float a, b, x[N];y = a*x[i] + b*x[i+1];
int a, b, x[N];
y = a•x[i] >> 2 + b•x[i+1];
1. Type Conversion
3. Fractional Fixed-Point Operations
2. Scaling Operations
![Page 11: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/11.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 11 of 32
Floating-Point to Fixed-Point Translator
SUIF Parser*
*SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu
Identifier Assignment
Optimizer
Instrument Code
ProfileSample Inputs
Fixed-PointConversion
![Page 12: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/12.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 12 of 32
Collecting Dynamic Range Information
profile(tmp_1,1);
profile(tmp_2,2);
profile(y,0);
Code Instrumentation:
Consider the ANSI C code:
float a, b, x[N]; y = a*x[i] + b*x[i+1]; tmp_1 = a*x[i];
tmp_2 = b*x[i+1];
y = tmp_1 * tmp_2;
ID Assignment:
“1” : tmp_1
“2” : tmp_2
“0” :
Equivalent Expression Tree:
+
*
*
a
x[i+1]
b
x[i]y
![Page 13: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/13.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 13 of 32
Generating Scaling Operations
Signal Scaling: Integer Word Length (IWL) definition: IWL[x] = log2 max(x) + 1
Fractional PartInteger PartSign bit
IWL
![Page 14: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/14.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 14 of 32
Generating Scaling Operations
IWLA measured
IWLA current
IWLA op B measured
IWLA op B current
IWLB measured
IWLB current
Converted Sub-Expressions
Example: “A op B”:
op
A B
?
![Page 15: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/15.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 15 of 32
Automatic Conversion Process:
IRP: Using Intermediate Result Profile Data Previous Algorithms:
‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed-Point Designer Tool)
A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.
Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes Is Useful Information Lost?
![Page 16: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/16.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 16 of 32
IRP: Additive Operations
where: nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
n = IWLA measured - IWLB measured
“A B” “(A << nA) (B >> [n-nB])”
IWLA+B current = IWLA measured
n
“A ± B”
B:
A:
For example, assume |A| > |B|, andIWLA+B measured IWLA measured
>> n
![Page 17: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/17.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 17 of 32
IRP: Multiplication
“A • B” “(A << nA) • (B << nB)”
where: nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
IWLA•B current = IWLA measured + IWLB measured
![Page 18: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/18.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 18 of 32
IRP: Division
“A / B” “(A >> [ndividend - nA]) / (B << nB)”
nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
ndiff = IWLA/B measured - IWLA measured + IWLB measured
ndividend =ndiff , if ndiff 00 , otherwise
![Page 19: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/19.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 19 of 32
IRP-SA: Using ‘Shift Absorption’
Question: Is information discarded unnecessarily here?
y = (a*x[i]<<1) + b*x[i+1]
Consider the following alternative:
Example:
y = (a*x[i] + (b*x[i+1]>>1)) << 1
BUT: Can we really discard most significant bits and get roughly the same answer???? YES!
![Page 20: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/20.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 20 of 32
Architectural Support
Fractional Multiplicationwith internal Left Shift
IWLA+ IWLBA*B:
IWLB
IWLA
A:
B:
Common occurrence (using IRP-SA): A•B << n
n
![Page 21: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/21.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 21 of 32
Experimental Results
Benchmarks
4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P)
(Normalized) Lattice Filter (LAT, NLAT)
128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW)
Levinson-Durbin Recursion (LEVDUR)
10x10 Matrix-Multiply (MMUL10)
Nonlinear Control (INVPEND)
Trig Function (SIN)
![Page 22: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/22.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 22 of 32
SQNR Enhancement: FMLS and/or IRP-SA
-0.5
0
0.5
1
1.5
2
Equ
ival
ent
Bit
s
IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW INVPEND LEVDUR MMUL10 SIN
IRP-SA
FMLS
IRP-SA w/ FMLS
![Page 23: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/23.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 23 of 32
What Is The Effect of “Shift Absorption” ?
0
0.2
0.4
0.6
0.8R
elat
ive
Fre
quen
cy
3 left 2 left 1 left none 1 rightFMLS Ouput Shift Distance
Distribution of Fractional Multiply Output Shifts
IRP IRP-SA
![Page 24: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/24.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 24 of 32
Experimental Results:
Rotational Inverted Pendulum
U of T System Control GroupNon-linear Testbench
![Page 25: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/25.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 25 of 32
Closed-Loop System Response: Rotational Inverted Pendulum 12-bit Controller Comparison
WC : 32.8 dBIRP-SA: 41.1 dBIRP-SA w/ fmls: 48.0 dB
![Page 26: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/26.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 26 of 32
128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop)
![Page 27: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/27.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 27 of 32
Speedup?Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies
![Page 28: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/28.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 28 of 32
…Yup!
![Page 29: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/29.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 29 of 32
Speedup* Using FMLS
1
1.1
1.2
1.3
1.4R
elat
ive
Spee
dup
IIR
4-C
IIR
4-P
NLA
T
LAT
FFT-
NR
FFT-
MW
LEV
DU
R
MM
UL1
0
INV
PEN
D
SIN
Limiting8-FMUL = { 4 left thru 3 right }4-FMUL = { 2 left thru 1 right }2-FMUL = { one left, no shift }
![Page 30: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/30.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 30 of 32
SQNR Enhancement for various Output Shift Sets
0
0.5
1
1.5
2
Eq
uiv
ale
nt
Bit
s
IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW LEVDUR MMUL10 INVPEND SIN
Limiting8-FMUL4-FMUL2-FMUL
![Page 31: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/31.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 31 of 32
Summary
The Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it)
Easy VLSI implementation, and easy for compiler to use.
![Page 32: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation](https://reader036.fdocuments.in/reader036/viewer/2022062322/568145c3550346895db2cdb3/html5/thumbnails/32.jpg)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 32 of 32
Future Directions
Higher Level Transformations: Automatic Generation of Block-Floating-Point... Quantization Error Feedback… BOTH need signal-flow-graph representation…
therefore probably need a better DSP language than ANSI C
Variable Precision Arithmetic (How much precision does each operation need?)