For a lot of details about FFTW checkout my interview RCE 73 of the FFTW authors.
FFTW is probably the most popular DFT library ever. What many don't realize is that there are many ways to calculate an FFT depending on the number of samples. FFTW addresses this by creating a plan and using dynamic code generation to try to quickly calculate an optimal method.
What is not realized is that the number of samples should be kept multiples of small primes, 2,3,5 etc. Calculating an FFT where the number of samples is a single large prime is the worse case.
Eg:
Plan for 999983 samples used: 99522862.000000 add's, 53927851.000000 multiplies, 42864846.000000 fused Multiply-Add
real 0m0.873s
Plan for 999984 samples used: 41417486.000000 add's, 28948213.000000 multiplies, 6618732.000000 fused Multiply-Add
real 0m0.312s
So in this sample code going from 999983, a prime number, to 999984, who's prime factors are 2 x 2 x 2 x 2 x 3 x 83 x 251, resulted in a performance improvement of about 3x.
Performance isn't always simple.
No comments:
Post a Comment