FFT Accelerator Project Rohit Prakash Anand Silodia Date: June 7 th, 2007.
-
Upload
blaze-collins -
Category
Documents
-
view
219 -
download
0
Transcript of FFT Accelerator Project Rohit Prakash Anand Silodia Date: June 7 th, 2007.
Objectives
• Analysis using random input points
• %age improvement (from the previous implementations)
• Cache profiling
Improvements
• Calls to sine/cosine decreased• Separate arrays for power, some
other terms– Division decreased– Multiplications decreased
• Error in last time corrected (FFTW floating point)
System Configuration
• Intel Pentium 4 (HT) 3.0Ghz• RAM : 1GB• Cache : 1MB L2• O.S. : Fedora Core 3• Compiler icc• Flags used : -xW, -O3, -ipo-prec-
div, -static
User time : vs. FFTW (single precision)
Radix-4 works 1.5 times slower than fftw
Radix-8 works 1.6 times slower than fftw
Cache Organization
Cache Level
Size Associativity
Line size
L2 1 MB 8-way 64
I1 16 KB 4-way 64
D1 16KB 4-way 64
Further Improvements : use sse instructions• Vectorize the loop
TA[r]Uw*A[r+p]Vw*w*A[r+2*p]Ww*w*w*A[r+3*p]----------------------------------Complex temp[4];For(i = 1; i<4;i++){
temp[i] = twiddle[i*p]*A[r+ i*l]
}