Re: inline power function replacement
- From: "chris.fairles@xxxxxxxxx" <chris.fairles@xxxxxxxxx>
- Date: 27 Jul 2006 11:51:06 -0700
websnarf@xxxxxxxxx wrote:
chris.fairles@xxxxxxxxx wrote:
Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.
I'm not sure this is the right newsgroup for such a question. Not a
lot of people in this group can fathom what exactly you are saying
here.
I see your point. I suppose the heart of this question is "what machine
code does this C code get translated into" which obviously is done by
the compiler (gcc 4.1.1 in my case).
Actually, this is more of an archetecture specific question because
really the question is "what should I do to optimize this function for
amd64 processors" heh. But then again, you did give some good insight
depsite the wrong newsgroup! So thanks!
Chris
One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:
static inline double
mult_power(double x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}
I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement?
Well, there are numerical considerations (i.e., your results will not
be identical to just calling pow().) But your idea is very well
motivated. There really should be a powint(double x, int y) function
in C (that does even more.) The problem of doing this as quickly as
possible for all integers is actually an interesting one.
Oh yeah, and the order of the cases should not matter if the compiler
has transformed this to a jump table. It only matters in the cases
where it doesn't do that, in which case you should just put the most
common cases as early as possible.
Or is there perhaps a more efficient solution?
Indeed. If y is mostly zero, then you should do this:
#define fastPowInt(x,y) ((0 == (y)) ? 1.0 : mult_power ((x), (y)))
The reason is that the single conditional on the top can leverage a
modern microprocessor's branch prediction capabilities, and thus cost
the absolutely least overhead for the vast majority of cases. A
switch() operation indeed does use jmp tables in many compilers, but
unless you are almost always switching to the same case, this has
roughly the same overhead as an indirect function call (calling through
a function pointer.) So, the surrounding conditional as shown above
should improve performance further still.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
.
- References:
- inline power function replacement
- From: chris.fairles@xxxxxxxxx
- Re: inline power function replacement
- From: websnarf
- inline power function replacement
- Prev by Date: A couple questions
- Next by Date: Re: Is this a good habit?
- Previous by thread: Re: inline power function replacement
- Next by thread: Re: inline power function replacement
- Index(es):
Relevant Pages
|