Re: inline power function replacement




websnarf@xxxxxxxxx wrote:
chris.fairles@xxxxxxxxx wrote:
Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

I'm not sure this is the right newsgroup for such a question. Not a
lot of people in this group can fathom what exactly you are saying
here.


I see your point. I suppose the heart of this question is "what machine
code does this C code get translated into" which obviously is done by
the compiler (gcc 4.1.1 in my case).

Actually, this is more of an archetecture specific question because
really the question is "what should I do to optimize this function for
amd64 processors" heh. But then again, you did give some good insight
depsite the wrong newsgroup! So thanks!

Chris

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(double x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement?

Well, there are numerical considerations (i.e., your results will not
be identical to just calling pow().) But your idea is very well
motivated. There really should be a powint(double x, int y) function
in C (that does even more.) The problem of doing this as quickly as
possible for all integers is actually an interesting one.

Oh yeah, and the order of the cases should not matter if the compiler
has transformed this to a jump table. It only matters in the cases
where it doesn't do that, in which case you should just put the most
common cases as early as possible.

Or is there perhaps a more efficient solution?

Indeed. If y is mostly zero, then you should do this:

#define fastPowInt(x,y) ((0 == (y)) ? 1.0 : mult_power ((x), (y)))

The reason is that the single conditional on the top can leverage a
modern microprocessor's branch prediction capabilities, and thus cost
the absolutely least overhead for the vast majority of cases. A
switch() operation indeed does use jmp tables in many compilers, but
unless you are almost always switching to the same case, this has
roughly the same overhead as an indirect function call (calling through
a function pointer.) So, the surrounding conditional as shown above
should improve performance further still.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: inline power function replacement
    ... switch ) ... and the order of the cases should not matter if the compiler ... the absolutely least overhead for the vast majority of cases. ... roughly the same overhead as an indirect function call (calling through ...
    (comp.lang.c)
  • Re: Optimize for Size
    ... This is because you have to include the "overhead" of the VCL. ... A compiler ... switch that would optimize for size meight reduce the size of the ...
    (borland.public.delphi.non-technical)
  • Re: inline
    ... Calling a function has a amount of overhead to it: ... this is usually one of the optimizations a compiler ...
    (comp.lang.c)
  • Re: compiling forth (unefficient)?
    ... tin gherdanarra writes Re: ... > I understand that forth has some overhead when ... > calling a word, even if it is directly threaded. ... > Is there a forth compiler that looks at the code, ...
    (comp.lang.forth)
  • Re: DPROD issues
    ... a switch like that typically ... makes a compiler nonstandard in that mode. ... treatment of specific intrinsics is one ... subroutine sub1a ...
    (comp.lang.fortran)