MOVZX has stall register

From: Bryan Parkoff (spamtrap_at_crayne.org)
Date: 08/25/04

  • Next message: Clax86 : "Having trouble posting?"
    Date: Wed, 25 Aug 2004 04:13:06 +0000 (UTC)
    
    

    It is very interesting that I want to mention why MOVZX would have
    stall
    register. I have been repeating the same thing before, but it might
    tell
    you the difference. Look at my example below.

    Example 1:
    MOV BL, 041H
    MOVZX EAX, BL
    ADD AL, 02H
    MOV ECX, EAX

        I suspect that MOVZX does not clear upper bits because it never
    uses XOR
    EAX, EAX. It looks like MOV EAX, 0H before MOV AL, BL instead of
    MOVZX EAX,
    BL. It makes the processor into thinking that this register is truly
    32
    bits. 32 bits register is filled before 8 bits register or 16 bits
    register
    are modified. It causes to have stall register. The only option is
    to use
    32 bits register plus it has to use AND instruction to mask only 8
    bits
    register.
        MOVZX has 3 cycles in register to register and 6 cycles in memory
    to
    register on 386 and 486. They are very slow. I am glad that Pentium
    Pro
    through Pentium IV are improved. MOVZX is no longer to be slow and
    also it
    does not take 3-6 cycles anymore. It is now taking 1 uops!!!

    Example 2:
    XOR EAX, EAX
    MOV AL, 0FCH
    ADD AL, 02H
    ADD AX, 0200H
    SHL AX, 01H
    MOV ECX, EAX

    Example 3:
    XOR EAX, EAX
    MOV AL, 0FCH
    ADD AL, 02H
    ADD EAX, 0200H
    SHL EAX, 01H
    MOV ECX, EAX

        Pentium III does not have stall register on Example 3, but it does
    on
    Example 2. I am shocked that Pentium IV does not have stall register
    on
    both Example 2 and Example 3. It is very strange. The processor rule
    is
    that 8 bits register can be modified before 32 bits can be modified
    after
    upper bits are cleared using XOR EAX, EAX.
        You are absolute correct that 8 bits register and 32 bits register
    can't
    be mixed however 32 bits register is like to link to 8 bits register.
    Do
    you recommend that 8 bits register should be modified before it can be
    moved
    to another 32 bits register by avoiding to modifying 32 bits register.
     Then
    32 bits register can be modified. Look at revised Example 3 below.

    Example 3 revised:
    XOR EAX, EAX
    MOV AL, 0FCH
    ADD AL, 02H
    MOV ECX, EAX
    ADD ECX, 0200H
    SHL ECX, 01H
    MOV EDX, ECX

        Do you think that it is safer than original Example 3? It is very
    painful for Pentium IV to avoid XOR EAX, EAX because it does not WANT
    us to
    use 8 bits register and 16 bits register. It encourages us to always
    use 32
    bits register like MOVZX. It is true that 8 bits and 16 bits
    instructions
    take two uops while 32 bits instructions take one uops.
        Why do Intel recommend x86 assembly programmers to use MOVZX
    instruction
    to work with 8 bits register and 16 bits register. Also, Intel tells
    them
    to always use AND instruction to mask 8 bits register and 16 bits
    register.
    It will only save one or two uops.
        Are you upset that you are not allowed to use 8 bits instructions
    and 16
    bits instructions otherwise it can degrade performance?
        Do you recommend that all variables should be 32 bits using MOV
    instruction instead of MOVZX instruction? Should we always use AND
    instruction to mask only 8 bits or 16 bits otherwise MOVZX EAX, AL and
    MOVZX EAX, AX are the option to replace AND instruction.
        It is like that Pentium 4 rejects 8 Bits instruction and 16 Bits
    instructions. Intel encourage us to stop using them, but always use
    32 Bits instructions. What do you think?

    -- 
    Bryan Parkoff
    

  • Next message: Clax86 : "Having trouble posting?"

    Relevant Pages

    • VMI Interface Proposal Documentation for I386, Part 5
      ... to three register arguments. ... of the native instruction set. ... Most of these calls behave as standard C functions, and as such, may ... clobber registers EAX, EDX, ECX, flags. ...
      (Linux-Kernel)
    • Re: Bit extraction
      ... The register EAX has its low 16 bits being AX, ... I think the NOP goes in place to allow it to pair with the instruction ... V pipeline on the superscalar Pentium Pro+, ...
      (comp.lang.asm.x86)
    • Re: How does this make you feel?
      ... >> XOR instruction that applies to a 1M range of VM; ... > register gives a memory address, and the second gives a byte-count (up ... in a CPU that has one execution pathway. ... instruction set and the specifics of its addressing modes; ...
      (comp.arch)
    • Re: Designing my own architecture to be simulated in software - need help with the ISA
      ... > memory address range is limited to 16 bits. ... > I'm reserving the HO byte of the word for the instruction type, ... The register operands are half-bytes in length, ... > sub - store the difference of two registers in a register ...
      (comp.arch)
    • x86 stack-frame to stack-frame move
      ... one stack-frame location to another? ... mov eax ... to some virtual scratch register. ... So I suppose my question is this: can I expect recent x86 instruction ...
      (comp.lang.asm.x86)