[OT] Re: memcpy() vs. for() performance

From: Case (no_at_no.no)
Date: 07/01/04

  • Next message: Jeremy Yallop: "Re: 'hello world' OS"
    Date: Thu, 01 Jul 2004 15:13:32 +0200
    
    

    Dan Pop wrote:
    > In <40e3d4dc$0$65124$e4fe514c@news.xs4all.nl> Case <no@no.no> writes:
    >
    >
    >>Dan Pop wrote:
    >>
    >>>In <40e28ca4$0$93324$e4fe514c@news.xs4all.nl> Case <no@no.no> writes:
    >>>
    >>>>Any (general) ideas about when (depending on SIZE) to use
    >>>>memcpy(), and when to use for()?
    >>>
    >>>ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
    >>>evidence that your memcpy() is very poorly implemented.
    >>>
    >>>A well implemented memcpy() can use many tricks to accelerate its
    >>>operation.
    >>
    >>I did some tests myself, and found out that this is only true
    >>when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
    >>the memcpy() when size is a variable. Unfortunately, at many
    >>places in my code, the size is variable. Although my understanding
    >>of this issue has increased, I must admit this was a flaw in my
    >>initial question: an over simplification.
    >>
    >>I'd be interested to hear comments/insights about this variable
    >>case.
    >
    >
    > It would be *very* helpful if you didn't mix up things. Inlining is one
    > thing and providing a highly optimised library version of memcpy is a
    > completely different one.

    I know the difference. What the compiler does looks like (in my eyes)
    a form of inlining (the function call is replaced). But at the same
    time the code that is inserted is highly optimized for the particular
    block size; it's not just inserting a standard piece of memcpy code.
    That's why I write 'inline/optimize', and quoted the expression to
    mark it as not to be taken to literally, because it's a combination.

    >
    > When the size is unknown at compile time (or too large), the compiler
    > cannot won't inline the memcpy call, it will call the library version.

    When I had to make a choice between the two, I would call it
    call it optimization. I'm surprized that you seem to prefer the
    term inlining. Why?

    > But the library version can still be much faster than the code generated
    > by the compiler from a for loop. Especially when dealing with arrays of
    > characters.

    Agreed. And, for simplicity I'd rather use one way all the time,
    instead of context depedently (either code-time or even run-time)
    choosing between a couple of alternatives. Otherwise this will
    easily fall within the famous 97%.

    >
    > If you want ultimate answers, benchmark the two versions yourself.
    > Keep in mind that they cannot be extrapolated to other implementations.

    Yep, one other good reason to always use memcpy(). However, how was
    the saying again .... "Never say always!" :-)

    Thanks,

    Case


  • Next message: Jeremy Yallop: "Re: 'hello world' OS"