Re: Trying to write strncpy() in ASM

From: Homosapien (spamtrap_at_crayne.org)
Date: 01/05/05


Date: Wed, 5 Jan 2005 19:32:45 +0000 (UTC)

Terje Mathisen wrote:
> Your current code is so slow as to give no advantage at all compared to
> a plain C version. :-(

Except for size. :-) But even that wasn't the point. I'm a beginner in
assembly and I was getting in some practice.

> In fact, you could probably write a C version that was significantly
> faster...

Undoubtedly!

> The first key idea is that you should not make a separate call to
> strlen, just do the length determination while copying.

Ok.

> Next, try to handle more than one character per iteration, remember that
> unaligned loads are more or less the same speed as aligned loads, except
> when straddling cache line boundaries.

So loading something that's more than one byte which crosses a cache
line takes more than one bus cycle or something and thus makes it slower
than something that's aligned? By how much?

> jl do_tail /* Signed compare in case of a negative count! */

jb must be for unsigned comparisons? Hmm... I never gave that any thought.

> /* Subtract 4 to make sure we have full blocks of 4: */
> sub ecx,4

So what you're doing is making sure you copy 4 bytes at a time.
Wouldn't all these tests and special cases slow the code down? Are you
sure this would be better than the simplest and most obvious code?

> The missing code should be obvious, except for one key idea: It is
> possible to test all the bytes in a register for being zero at the same
> time, this is probably faster than the four explicit test/branch
> operations I did in my main loop.

Interesting!

> If you can use MMX/SSE for the inner operation, then you can copy 8 or
> 16 bytes at once, but the check for any zero byte becomes a bit harder
> in that it is easy to test, but hard to move the result of such a test
> into an integer register or the flags.

Is it worth the bother then?

> One possibility is to mask all the copied values with another register
> which starts out containing 0FFh in all bytes, but after a zero byte is
> found, it will mask away any following bytes. The main problem with this
> approach is that it will trap if you read bytes beyond both the end of
> the source string and the end of allocated memory.

So I have already discovered :-)

> ***********************************************************************
> NOTICE: This e-mail transmission, and any documents, files or previous
> e-mail messages attached to it, may contain confidential or privileged
> information. If you are not the intended recipient, or a person
> responsible for delivering it to the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or use of
> any of the information contained in or attached to this message is
> STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender and delete the e-mail and attached
> documents. Thank you.
> ***********************************************************************

NOTICE: This email is the official version of the voices inside my
head. ;-)

Why a notice like this when you are intentionally sending the message to
USENET where many unintended recipients will see it? ;-)

Ah... that's "business" for ya!



Relevant Pages