Re: Trying to write strncpy() in ASM
From: Terje Mathisen (spamtrap_at_crayne.org)
Date: 01/05/05
- Previous message: Clax86 : "Having trouble posting?"
- In reply to: Homosapien : "Trying to write strncpy() in ASM"
- Next in thread: Homosapien : "Re: Trying to write strncpy() in ASM"
- Reply: Homosapien : "Re: Trying to write strncpy() in ASM"
- Reply: chapman Billy : "Re: Trying to write strncpy() in ASM"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 5 Jan 2005 08:40:20 +0000 (UTC)
Homosapien wrote:
> Hi, I'm trying to write a strncpy in assembly language to get a bit of
> asm exercise, but I don't understand why my implementation doesn't work.
>
[snip]
> The Dinkumware Standard C library has a better implementation than most
> C libraries out there, and the Dinkumware manual says:
>
> of strncpy(s1, s2, n):
> The function copies the string s2, not including its terminating null
> character, to successive elements of the array of char whose first
> element has the address s1. It copies no more than n characters from s2.
> The function then stores zero or more null characters in the next
> elements to be altered in s1 until it stores a total of n characters. It
> returns s1.
Your current code is so slow as to give no advantage at all compared to
a plain C version. :-(
In fact, you could probably write a C version that was significantly
faster...
The first key idea is that you should not make a separate call to
strlen, just do the length determination while copying.
Next, try to handle more than one character per iteration, remember that
unaligned loads are more or less the same speed as aligned loads, except
when straddling cache line boundaries.
char *strncpy(char *dst, const char *src, int n)
{
__asm {
mov edi,[dst]
mov esi,[src]
mov ecx,[n]
/* If n is short, do it one char/iteration: */
cmp ecx,8
jl do_tail /* Signed compare in case of a negative count! */
/* Subtract 4 to make sure we have full blocks of 4: */
sub ecx,4
/* Check destination alignment: */
test edi,3
jz align_4_ok
align_loop:
mov al,[esi]
inc esi
mov [edi],al
inc edi
dec ecx
test al,al
jz zero_align
test edi,3
jnz align_loop
align_4_ok:
/* From this point on we can copy four chars/iteration: */
copy_4_loop:
mov eax,[esi]
add esi,4
/* Did we hit a zero during this block? */
test al,al
jz zero_0
test ah,ah
jz zero_1
test eax,0ff0000h
jz zero_2
test eax,0ff000000h
jz zero_3
mov [edi],eax
add edi,4
sub ecx,4
jae copy_4_loop
add ecx,4
do_tail:
...
The different zero alignment cases are relatively trivial:
zero_0:
zero_1:
and eax,0ffh
zero_2:
zero_3:
and eax,0ffffffh
mov [edi],eax
add edi,4
xor eax,eax
mov edx,ecx
sub ecx,4
jb tail_zero
/* store full 32-bit blocks of zeroes */
shr ecx,2
inc ecx
rep stosd
tail_zero:
mov ecx,edx
and ecx,3
rep stosb
The missing code should be obvious, except for one key idea: It is
possible to test all the bytes in a register for being zero at the same
time, this is probably faster than the four explicit test/branch
operations I did in my main loop.
If you can use MMX/SSE for the inner operation, then you can copy 8 or
16 bytes at once, but the check for any zero byte becomes a bit harder
in that it is easy to test, but hard to move the result of such a test
into an integer register or the flags.
One possibility is to mask all the copied values with another register
which starts out containing 0FFh in all bytes, but after a zero byte is
found, it will mask away any following bytes. The main problem with this
approach is that it will trap if you read bytes beyond both the end of
the source string and the end of allocated memory.
Terje
-- - <Terje.Mathisen@hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
***********************************************************************
NOTICE: This e-mail transmission, and any documents, files or previous
e-mail messages attached to it, may contain confidential or privileged
information. If you are not the intended recipient, or a person
responsible for delivering it to the intended recipient, you are
hereby notified that any disclosure, copying, distribution or use of
any of the information contained in or attached to this message is
STRICTLY PROHIBITED. If you have received this transmission in error,
please immediately notify the sender and delete the e-mail and attached
documents. Thank you.
***********************************************************************
- Previous message: Clax86 : "Having trouble posting?"
- In reply to: Homosapien : "Trying to write strncpy() in ASM"
- Next in thread: Homosapien : "Re: Trying to write strncpy() in ASM"
- Reply: Homosapien : "Re: Trying to write strncpy() in ASM"
- Reply: chapman Billy : "Re: Trying to write strncpy() in ASM"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|