Re: Trying to write strncpy() in ASM
From: Homosapien (spamtrap_at_crayne.org)
Date: 01/05/05
- Next message: Clegg : "Accessing 2MB in real mode..."
- Previous message: wolfgang kern: "Re: Trying to write strncpy() in ASM"
- In reply to: Terje Mathisen : "Re: Trying to write strncpy() in ASM"
- Next in thread: Matt: "Re: Trying to write strncpy() in ASM"
- Reply: Matt: "Re: Trying to write strncpy() in ASM"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 5 Jan 2005 19:32:45 +0000 (UTC)
Terje Mathisen wrote:
> Your current code is so slow as to give no advantage at all compared to
> a plain C version. :-(
Except for size. :-) But even that wasn't the point. I'm a beginner in
assembly and I was getting in some practice.
> In fact, you could probably write a C version that was significantly
> faster...
Undoubtedly!
> The first key idea is that you should not make a separate call to
> strlen, just do the length determination while copying.
Ok.
> Next, try to handle more than one character per iteration, remember that
> unaligned loads are more or less the same speed as aligned loads, except
> when straddling cache line boundaries.
So loading something that's more than one byte which crosses a cache
line takes more than one bus cycle or something and thus makes it slower
than something that's aligned? By how much?
> jl do_tail /* Signed compare in case of a negative count! */
jb must be for unsigned comparisons? Hmm... I never gave that any thought.
> /* Subtract 4 to make sure we have full blocks of 4: */
> sub ecx,4
So what you're doing is making sure you copy 4 bytes at a time.
Wouldn't all these tests and special cases slow the code down? Are you
sure this would be better than the simplest and most obvious code?
> The missing code should be obvious, except for one key idea: It is
> possible to test all the bytes in a register for being zero at the same
> time, this is probably faster than the four explicit test/branch
> operations I did in my main loop.
Interesting!
> If you can use MMX/SSE for the inner operation, then you can copy 8 or
> 16 bytes at once, but the check for any zero byte becomes a bit harder
> in that it is easy to test, but hard to move the result of such a test
> into an integer register or the flags.
Is it worth the bother then?
> One possibility is to mask all the copied values with another register
> which starts out containing 0FFh in all bytes, but after a zero byte is
> found, it will mask away any following bytes. The main problem with this
> approach is that it will trap if you read bytes beyond both the end of
> the source string and the end of allocated memory.
So I have already discovered :-)
> ***********************************************************************
> NOTICE: This e-mail transmission, and any documents, files or previous
> e-mail messages attached to it, may contain confidential or privileged
> information. If you are not the intended recipient, or a person
> responsible for delivering it to the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or use of
> any of the information contained in or attached to this message is
> STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender and delete the e-mail and attached
> documents. Thank you.
> ***********************************************************************
NOTICE: This email is the official version of the voices inside my
head. ;-)
Why a notice like this when you are intentionally sending the message to
USENET where many unintended recipients will see it? ;-)
Ah... that's "business" for ya!
- Next message: Clegg : "Accessing 2MB in real mode..."
- Previous message: wolfgang kern: "Re: Trying to write strncpy() in ASM"
- In reply to: Terje Mathisen : "Re: Trying to write strncpy() in ASM"
- Next in thread: Matt: "Re: Trying to write strncpy() in ASM"
- Reply: Matt: "Re: Trying to write strncpy() in ASM"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|