Re: Trying to write strncpy() in ASM

From: Terje Mathisen (spamtrap_at_crayne.org)
Date: 01/05/05

  • Next message: wolfgang kern: "Re: Trying to write strncpy() in ASM"
    Date: Wed, 5 Jan 2005 08:40:20 +0000 (UTC)
    
    
    

    Homosapien wrote:

    > Hi, I'm trying to write a strncpy in assembly language to get a bit of
    > asm exercise, but I don't understand why my implementation doesn't work.
    >
    [snip]
    > The Dinkumware Standard C library has a better implementation than most
    > C libraries out there, and the Dinkumware manual says:
    >
    > of strncpy(s1, s2, n):
    > The function copies the string s2, not including its terminating null
    > character, to successive elements of the array of char whose first
    > element has the address s1. It copies no more than n characters from s2.
    > The function then stores zero or more null characters in the next
    > elements to be altered in s1 until it stores a total of n characters. It
    > returns s1.

    Your current code is so slow as to give no advantage at all compared to
    a plain C version. :-(

    In fact, you could probably write a C version that was significantly
    faster...

    The first key idea is that you should not make a separate call to
    strlen, just do the length determination while copying.

    Next, try to handle more than one character per iteration, remember that
    unaligned loads are more or less the same speed as aligned loads, except
    when straddling cache line boundaries.

    char *strncpy(char *dst, const char *src, int n)
    {
       __asm {
         mov edi,[dst]
         mov esi,[src]
         mov ecx,[n]

    /* If n is short, do it one char/iteration: */
        cmp ecx,8
          jl do_tail /* Signed compare in case of a negative count! */

    /* Subtract 4 to make sure we have full blocks of 4: */
        sub ecx,4

    /* Check destination alignment: */
        test edi,3
          jz align_4_ok

    align_loop:
        mov al,[esi]
        inc esi
        mov [edi],al
        inc edi
        dec ecx
        test al,al
         jz zero_align
        test edi,3
         jnz align_loop

    align_4_ok:
    /* From this point on we can copy four chars/iteration: */

    copy_4_loop:
        mov eax,[esi]
        add esi,4
    /* Did we hit a zero during this block? */

        test al,al
         jz zero_0
        test ah,ah
         jz zero_1
        test eax,0ff0000h
         jz zero_2
        test eax,0ff000000h
         jz zero_3

        mov [edi],eax
        add edi,4

        sub ecx,4
         jae copy_4_loop

        add ecx,4
    do_tail:
        ...

    The different zero alignment cases are relatively trivial:

    zero_0:
    zero_1:
       and eax,0ffh
    zero_2:
    zero_3:
       and eax,0ffffffh

       mov [edi],eax
       add edi,4
       xor eax,eax
       mov edx,ecx
       sub ecx,4
        jb tail_zero

    /* store full 32-bit blocks of zeroes */
       shr ecx,2
       inc ecx
       rep stosd

    tail_zero:
       mov ecx,edx
       and ecx,3
       rep stosb

    The missing code should be obvious, except for one key idea: It is
    possible to test all the bytes in a register for being zero at the same
    time, this is probably faster than the four explicit test/branch
    operations I did in my main loop.

    If you can use MMX/SSE for the inner operation, then you can copy 8 or
    16 bytes at once, but the check for any zero byte becomes a bit harder
    in that it is easy to test, but hard to move the result of such a test
    into an integer register or the flags.

    One possibility is to mask all the copied values with another register
    which starts out containing 0FFh in all bytes, but after a zero byte is
    found, it will mask away any following bytes. The main problem with this
    approach is that it will trap if you read bytes beyond both the end of
    the source string and the end of allocated memory.

    Terje

    -- 
    - <Terje.Mathisen@hda.hydro.com>
    "almost all programming can be viewed as an exercise in caching"
    
    

    ***********************************************************************
    NOTICE: This e-mail transmission, and any documents, files or previous
    e-mail messages attached to it, may contain confidential or privileged
    information. If you are not the intended recipient, or a person
    responsible for delivering it to the intended recipient, you are
    hereby notified that any disclosure, copying, distribution or use of
    any of the information contained in or attached to this message is
    STRICTLY PROHIBITED. If you have received this transmission in error,
    please immediately notify the sender and delete the e-mail and attached
    documents. Thank you.
    ***********************************************************************


  • Next message: wolfgang kern: "Re: Trying to write strncpy() in ASM"

    Relevant Pages

    • Re: Admired designs / designs to study
      ... instructions to try to pack together a whole word ... resort to using one of the string or bit move ... Get the next character from the ... if the number isn't zero yet. ...
      (comp.arch)
    • Re: newbie prolog question, passing functions
      ... version 2.1 of the License, or any later version. ... See the GNU ... uses http://www.freetds.org/ libraries ... Area code seven eight zero, Exchange four six six, Local zero one zero nine ...
      (comp.lang.prolog)
    • Re: Word XP - Strikethrough Font & More oops
      ... There are fonts with this type of character, but they're not that easy to ... after the zero. ... Word MVP FAQ site: http://word.mvps.org ... >>> Mary Sauer MS MVP ...
      (microsoft.public.word.printingfonts)
    • Re: Free Space Monster (episode 2)
      ... an unprintable character ... The hex number 30 can be represented as the ASCII *character* 0 ... If instead I had just posted "zero", everyone here, including you, could ... Leave ASCII character representations out of it, ...
      (comp.sys.mac.apps)
    • Re: Ping Mark Conrad--Hex zero revisited
      ... set of integers with a single member (which has a value of zero). ... A pointer with a value of null is not pointing ... assigned in ASCII to the control character with a numeric value of zero. ...
      (comp.sys.mac.system)