Re: Tips on optimizing these functions



Andrea Taverna wrote:

I've done some benchmarks with copying and initialisation. Compared to a
specific-nested-loop solution, the functions take up to twice the time.
However, turning on some optimization flags, specifically '-O3' with gcc,
the gap between the recursive and the specific solution reduces to 20%.

So, have you got any advice about optimizing this code?
Other suggestions are welcomeas well.

typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n > 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}

This is so dependent on the platform that we could justifiably argue you
should choose one, and go to a forum associated with that platform.
Do any of the compilers you use take advantage of restrict?
If elem_size happens to match frequently the size of a stdint type, you
will need to switch case the code so as to remove the inner loop for those
cases.
Some compilers automatically substitute a run-time library copy function
which invokes all the usual memcpy() optimizations (align destination,
move groups of bytes per instruction).
If you wrote memcpy() in line, that would work well with certain
compilers, not so well with others (possibly depending on command line
options and which run time library you choose). If you are somehow
prohibited from using restrict, writing in memcpy() makes the same assertion.
.



Relevant Pages

  • Re: Results of the memswap() smackdown from the thread "Sorting" assignment
    ... you used small strings for a reason. ... (but not as good as Ben's optimization) ... relying on the quality of whatever library does memcpy while at the ... a way that I can redefine any library function by a macro. ...
    (comp.programming)
  • Re: Duffs Device
    ... I'd disagree a little on the above assessment that even in C unless you were looking at a very low-level case similar to what Glen was talking about would the optimizer not likely make such improvements as were appropriate for the specific hardware platform/compiler implementation if one simply wrote the "straightahead" loop. ... If, of course, there's a lot of indirection and all inside, that could limit optimization but for the simple case don't think it would help much for most modern compilers. ...
    (comp.lang.fortran)
  • Re: a LISP raytracer
    ... >> of optimization hints and support some form of type inference. ... > Optimisation and type inference is a real weak point of CMUCL and SBCL ... > compared compilers for other languages, like Stalin, MLton and ocamlopt. ... >> Try to reduce garbage generation to an absolute minimum. ...
    (comp.graphics.rendering.raytracing)
  • Re: Is there a programming language that is combination of Python and Basic?
    ... average programmer, who takes a moment to think it out, ... all but the best commercial compilers. ... compilers can do better than most people. ... registers and addresses, self optimization is simple. ...
    (comp.lang.python)
  • Re: defined operator & assignment speed & memory usage problem
    ... that if the compilers could generally be trusted to do this no matter ... Certain C++ templates can be an advantage for optimization, at least inner_product, when seen as a partial equivalent of Fortran dot_product. ... The usual template practice of "loop until reaching object past end of loop" rather than "give me a way to calculate loop count" can be a severe obstacle to optimization. ...
    (comp.lang.fortran)