Although it is widely known that FFT's provide a fast way to implement long convolutions or correlations (a correlation is just a convolution with the impulse response reversed and conjugated), using them requires longer and more complex code. So the question arises when is it worth it? Just how long does a convolution have to be before using FFT's provides significant savings? For some reason, this issue in not addressed in most DSP/FFT literature, despite its importance to a DSP system designer. To get a feel for the answer to this we shall consider the implementation of a complex 1D FIR filter on an 'infinite' 1D complex data stream by two methods. Direct convolution and Radix 2 FFT convolution using 50% overlap between FFT's.
We shall assume that the filter contains 2p+1 taps (its impulse response is 2p+1 samples long). This length has been chosen to ensure that when convolved with 2p signal points, the resulting sequence has length (2p+1)+(2p)-1=2p+1, which fits a Radix 2 FFT comfortably.
In each case, we shall quantify the algorithm in terms of the number of complex additions NA(p) and the number of complex multiplies NM(p) required to generate 2p output points.
Direct convolution can be performed point by point or in blocks. In either case each output point will require 2p+1 complex multiplies and 2p complex additions. Therefore:
This scheme works as follows:
The following optimisations are assumed:
We may note that the saving presented optimisation 2 above effectively cancels out the cost of step 5, so we are left with two 2p+1 point FFT's (or IFFT's) and 2p+1 complex multiplies. We have derived the number of butterflies required by a 2p Radix 2 FFT:
The number of 'trivial' butterflies (which require no multiplication) is (p>2 !!):
Each (non trivial) butterfly requires 2 complex additions and 1 complex multiply.So, for a single 2p+1 point FFT (p>1 !!):
So for the entire algorithm (two 2p+1 point FFT's (or IFFT's) and 2p+1 complex multiplies):
To make the comparison simpler, we shall assume that real multiplies and adds take the same time (as they often do), and combine the figures for NA(p) and NM(p) as NOP(p) operations on real numbers as follows:
So for direct convolution we have:
For FFT convolution we have:
The ratio of these two is what we really want..
This tells us (roughly) how much slower direct convolution (or correlation) is compared to FFT convolution. Let's tabulate a few values:
| p | FIR Taps (=2p+1) | (Direct/FFT) Ratio |
| 3 | 9 | 1.2 |
| 4 | 17 | 1.7 |
| 5 | 33 | 2.7 |
| 6 | 65 | 4.5 |
| 7 | 129 | 7.6 |
| 8 | 257 | 13.2 |
| 9 | 513 | 23.3 |
| 10 | 1025 | 41.8 |
For large p, a good approximation is:
So it would appear that even for filters as short as 17 taps FFT based convolution looks attractive (in terms of the number of arithmetic operations). However, a few notes of caution are appropriate:
As usual, if you really want to know which method is fastest for your particular application on your particular processor, the best advice is to try them both.
©1999 - Engineering Productivity Tools Ltd.