[OT] Re: memcpy() vs. for() performance
From: Case (no_at_no.no)
Date: 07/01/04
- Previous message: Dan Pop: "Re: Swapping Bull***"
- In reply to: Dan Pop: "Re: memcpy() vs. for() performance"
- Next in thread: Dan Pop: "Re: [OT] Re: memcpy() vs. for() performance"
- Reply: Dan Pop: "Re: [OT] Re: memcpy() vs. for() performance"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 01 Jul 2004 15:13:32 +0200
Dan Pop wrote:
> In <40e3d4dc$0$65124$e4fe514c@news.xs4all.nl> Case <no@no.no> writes:
>
>
>>Dan Pop wrote:
>>
>>>In <40e28ca4$0$93324$e4fe514c@news.xs4all.nl> Case <no@no.no> writes:
>>>
>>>>Any (general) ideas about when (depending on SIZE) to use
>>>>memcpy(), and when to use for()?
>>>
>>>ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
>>>evidence that your memcpy() is very poorly implemented.
>>>
>>>A well implemented memcpy() can use many tricks to accelerate its
>>>operation.
>>
>>I did some tests myself, and found out that this is only true
>>when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
>>the memcpy() when size is a variable. Unfortunately, at many
>>places in my code, the size is variable. Although my understanding
>>of this issue has increased, I must admit this was a flaw in my
>>initial question: an over simplification.
>>
>>I'd be interested to hear comments/insights about this variable
>>case.
>
>
> It would be *very* helpful if you didn't mix up things. Inlining is one
> thing and providing a highly optimised library version of memcpy is a
> completely different one.
I know the difference. What the compiler does looks like (in my eyes)
a form of inlining (the function call is replaced). But at the same
time the code that is inserted is highly optimized for the particular
block size; it's not just inserting a standard piece of memcpy code.
That's why I write 'inline/optimize', and quoted the expression to
mark it as not to be taken to literally, because it's a combination.
>
> When the size is unknown at compile time (or too large), the compiler
> cannot won't inline the memcpy call, it will call the library version.
When I had to make a choice between the two, I would call it
call it optimization. I'm surprized that you seem to prefer the
term inlining. Why?
> But the library version can still be much faster than the code generated
> by the compiler from a for loop. Especially when dealing with arrays of
> characters.
Agreed. And, for simplicity I'd rather use one way all the time,
instead of context depedently (either code-time or even run-time)
choosing between a couple of alternatives. Otherwise this will
easily fall within the famous 97%.
>
> If you want ultimate answers, benchmark the two versions yourself.
> Keep in mind that they cannot be extrapolated to other implementations.
Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)
Thanks,
Case
- Previous message: Dan Pop: "Re: Swapping Bull***"
- In reply to: Dan Pop: "Re: memcpy() vs. for() performance"
- Next in thread: Dan Pop: "Re: [OT] Re: memcpy() vs. for() performance"
- Reply: Dan Pop: "Re: [OT] Re: memcpy() vs. for() performance"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]