Re: Is this the optimal FIR filter on all platforms?
From: Arthur J. O'Dwyer (ajo_at_nospam.andrew.cmu.edu)
Date: 10/25/03
- Next message: Richard Heathfield: "Re: wait until \n appears in string"
- Previous message: CBFalconer: "Re: void pointers and more"
- In reply to: Johan Bergman: "Is this the optimal FIR filter on all platforms?"
- Next in thread: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 25 Oct 2003 02:24:48 -0400 (EDT)
[Warning: long style critique ahead]
On Sat, 25 Oct 2003, Johan Bergman wrote:
>
> Maybe someone can help me to optimize this C/C++ implementation of a FIR
> filter (although I realize that I should probably consider an FFT approach
> instead.)
What in the world is an FIR filter? (Finite impulse-response, yes,
yes, I know, I looked it up. Point is, practically nobody in this
newsgroup will know, either, and they'll *all* have to look it up
if they really want to help you. You're making it harder for people
to help you when you don't define your terms.)
> My problem is that this implementation performs very well in Cygwin on my
> laptop, but less good in Linux and even worse in SunOS. I have tried to
> calculate the number of mega-cycles that the example program consumes on the
> different platforms (execution time in seconds multiplied by CPU clock
> frequency in MHz):
>
> Cygwin: 448 Mcycles (216 Mcycles without *tmp_output_ptr++ = sum;)
> Linux: 1090 Mcycles (151 Mcycles without *tmp_output_ptr++ = sum;)
> SunOS: 2800 Mcycles (103 Mcycles without *tmp_output_ptr++ = sum;)
> Can someone help me to speed up the program on the Linux and SunOS
> platforms?
Well, first let's clean up your code and see what it looks like
then. Clean code is always easier to operate on. :-)
> #include <malloc.h>
Non-standard header. 'malloc' is declared in <stdlib.h>.
#include <stdlib.h>
> int main()
[Some regulars strongly believe in 'int main(void)'. I
don't have a strong preference either way, but it's wise
to consider why you picked the style you did, even on
little things like this.]
> {
> const int nrof_lags = 10000;
> const int nrof_taps = 10000;
> const int * const coeff_ptr =
Okay, yeah, 'const' is great. But really, this is ridiculous.
You don't think that all these consts are somehow making your
program more efficient, do you? (They're not hurting, but
over-constifying is Bad mostly because it's Ugly, and Ugly code
is hard to make Right.)
> (const int * const) malloc(nrof_taps*sizeof(int));
malloc(nrof_taps * sizeof *coeff_ptr);
No chance for error using this idiom.
Also note that casting a malloc() result to 'const' anything
is just silly, because free() takes a non-const pointer
argument. How are you going to free() this memory, hmm?
> const int * const input_ptr =
> (const int * const) malloc((nrof_taps-1+nrof_lags)*sizeof(int));
> int const * output_ptr = (int const *) malloc(nrof_lags*sizeof(int));
> const int * tmp_coeff_ptr;
> const int * tmp_input_ptr;
> int * tmp_output_ptr;
> int sum;
> int lag, tap;
Way too many similarly-named variables, in my opinion.
Again, purely a stylistic issue, but again, clean code is
nice code. Note that in the cleaned-up version, I've pulled
in their scope to where they're actually used. This may
actually help with the efficiency of the program, since some
optimizers look for "windows" in which objects can be moved
into registers and suchlike. The wider their scopes are, the
harder it is for the optimizer to see them.
> tmp_output_ptr = (int *) output_ptr;
Casting away constness is Bad. Especially when you've just
finished jumping through hoops to make it const in the first
place. Geez.
> for (lag=0; lag<nrof_lags; lag++)
> {
> tmp_coeff_ptr = (const int *) coeff_ptr;
> tmp_input_ptr = (const int *) input_ptr + lag;
STOP PERFORMING RANDOM CASTS! For Pete's sake!
Get a hold of yourself here!
> sum = 0;
> for (tap=0; tap<nrof_taps; tap++)
> {
> sum += *tmp_coeff_ptr++ * *tmp_input_ptr++;
This line (and the one below) are just begging to be
simplified. Let's expand it so we can see the order
of operations.
> }
> *tmp_output_ptr++ = sum;
> }
/* always */ return 0; /* from main */
> }
Okay, let's put it all together: no casts, clean names,
clean mallocs, and see what we get!
You see how all those tmp_*_ptr objects just melt away
once we start to see the outline of the program clearly.
In fact, we uncover a subtle bug in the memory allocation,
and fix it ('input' was being malloc'ed one element short)!
Finally, we strive for idiomatic identifiers: using 'i' and
'j' for the nested loop controls, and losing the silly
'_ptr' suffixes on the main data arrays.
#include <stdlib.h>
int main(void)
{
int nrof_lags = 10000; /* Could both of these be #defines? */
int nrof_taps = 10000; /* That would help a little. */
int *coeffs = malloc(nrof_taps * sizeof *coeffs);
int *input = malloc((nrof_taps+nrof_lags) * sizeof *input);
int *output = malloc(nrof_lags * sizeof *output);
int i, j;
for (i=0; i < nrof_lags; ++i)
{
int sum = 0;
for (j=0; j < nrof_taps; ++j)
sum += coeffs[j] * input[i+j];
output[i] = sum;
}
free(coeffs);
free(input);
free(output);
return 0;
}
There -- isn't that a heck of a lot simpler and easier to
read? I bet it performs comparably fast on all your machines.
If not, you might try changing 'int' to 'unsigned int' (in
fact, you might do that anyway, to guard against overflow
conditions); reversing the order of either or both loops;
if many of the 'coeffs[j]' are zero, try pre-processing
'coeffs' a little bit; little tweaks like that.
I think most of your performance differences are due to
different cache layouts on the different machines, but I
could easily be wrong.
How's the code do now?
HTH,
-Arthur
- Next message: Richard Heathfield: "Re: wait until \n appears in string"
- Previous message: CBFalconer: "Re: void pointers and more"
- In reply to: Johan Bergman: "Is this the optimal FIR filter on all platforms?"
- Next in thread: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Reply: Johan Bergman: "Re: Is this the optimal FIR filter on all platforms?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|