Re: 8-Bit Register on Pentium 4

From: Bryan Parkoff (spamtrap_at_crayne.org)
Date: 03/20/05

  • Next message: Chewy509: "Re: 8-Bit Register on Pentium 4"
    Date: Sun, 20 Mar 2005 05:59:56 +0000 (UTC)
    
    

    >>> movzx ebx, word [Low_Byte] ; Load our 16bit value
    >>> movzx eax, byte [_Offset] ; Load our 8bit value
    >>> mov dl, 07fh
    >>> add ebx, eax ;add them together (ebx has result)
    >>> cmp dl, al ;see is _Offset is larger than 07fh
    >>> setc cl ; carry is larger
    >>> sub bh, cl ; sub the carry flag
    >>> mov [Low_Byte], bx ; mov into memory
    > Why didn't you just say that you wanted to add a signed 8bit value to an
    > unsigned 16bit value?

    > Why didn't you just say that you wanted to add a signed 8bit value to an
    > unsigned 16bit value?
        You wrote your code above, but I wrote my code below like we talked
    earlier.

      MOV AL, BYTE PTR [Low_Byte]
      MOV AH, BYTE PTR [High_Byte]
      MOV CL, BYTE PTR [_Offset]
      ADD AL, CL
      ADC AH, 0H
      MOV CH, 07FH
      CMP CH, CL
      SBB CH, CH
      ADD AH, CH
      MOV BYTE PTR [Low_Byte], AL
      MOV BYTE PTR [High_Byte], AH

        It is fine, but I worry that other processors may not have the same
    Intel's instructions such as CBW instruction, MOVZX instruction, and/or
    MOVSX instruction. My code above is only the option for other processors
    that they do not have MOVZX instruction nor MOVSX instruction. It is why
    CMP 07FH, [Offset] is needed on other processors. It looks like that other
    processors require more instructions which may take more bytes, but Intel
    already has all instructions available so it has less bytes. I believe that
    Intel can still use 2 or 4 instructions that they may be faster than other
    processors which other processors take 9 instructions or more.
        I am not sure about IA-64 if IA-64 has instructions, but I realize that
    it has removed a lot of instructions which they are not needed.

    First Example:
    >> MOVZX EBX, WORD PTR [Low_Byte2]
    >> MOVSX EAX, BYTE PTR [_Offset]
    >> ADD EBX, EAX
    >> MOV WORD PTR [Low_Byte2], BX

    Second Example:
    >> MOVSX EAX, BYTE PTR [_Offset]
    >> ADD WORD PTR [Low_Byte2], AX

        You quoted earlier that second example can use 16-Bit or 32-Bit which it
    can only modify in the memory rather than register. It seems to be fine.
    Please state your opinion. Do you think that second example is better than
    first example because variable can be modified once using ADD mem, reg? If
    it is to modify variable twice or more, first example would be the option
    because it has ADD reg, reg rather than ADD mem, reg because registers can
    be modified more than twice or more which they are faster.

    >> Thank you very much for the information. I think that you should take
    >> a look at your code that you wrote above. Didn't you think that EBX is
    >> still partial register stall? It is because you modified BH while BX is
    >> already present. Other way would be better to my code below.
    >
    > A partial register stall occurs on a false dependency. The above is a true
    > dependency, thus a stall (if one was to occur) is necessary.
        I will have to check Pentium 4 Optimization manual because it says to
    avoid AH, BH, CH, and DH registers because they are slow. If you want to
    use 8-Bit register, you have to use AL, BL, CL, and DL register, but not AH,
    BH, CH, and DH registers. I try to understand what "false dependency"
    means. Look at below.

    MOV AX, 02001H
    ADD AL, 010H
    ADD AH, 01H
    MOV WORD PTR [TEMP_DATA], AX

        Is it considered false dependency, but it might have partial register
    stall so Pentium 4 does not care?

    MOV AL, 01H
    MOV AH, 020H
    ADD AL, 010H
    ADD AH, 01H
    MOV BYTE PTR [TEMP_LOW], AL
    MOV BYTE PTR [TEMP_HIGH], AH

        Is it considered true dependency because AL register and AH register use
    in the same EAX register? Do you recommend to replace from AH to CL
    register so it would be false dependency?

    >> I break 16-Bit into two 8-Bit. Like this below.
    >>
    >> MOV EAX, 02001H
    >> ADD .....Do something
    >> MOV BYTE PTR [Low_Byte], AL
    >> MOV BYTE PTR [High_Byte], AH
    >>
    >> Notice "MOV BYTE PTR [High_Byte], AH? Do you think that it is ok to
    >> use AH, BH, CH, or DH register to move data back to variable memory
    >> because it should always be avoided?
    >
    > Do you have the exact page in the optimisation manual that says to avoid
    > ah .. dh? If using ah .. dh produces the cleanest code, then why not?
    >
    > You should know about partial register usage, but don't let it get in the
    > way of clean and easy to read code. (Partial register usage on the P4 has
    > NO penalty in itself, however you avoid using partial registers to avoid
    > false dependencies within the code stream, which effects how the OOE
    > engine works, which can lead to unnecessary stalls).
    >
    >> You claim that if it modifies AH, BH, CH, or DH register, it would be
    >> required to move it to AL and then modify before move it back to AH. It
    >> seems that AH remains unmodified so it can be stored back to variable
    >> memory. Please advise.
    >
    > When did I say that, or anything to that effect?
        Andreas Kaiser said below.

    Because there are not xH operations internally. Data has to be shifted
    into the right place, calculated, then shifted back again. Needless to
    say this is not exactly fast.

        I am not sure I understand what he meant.


  • Next message: Chewy509: "Re: 8-Bit Register on Pentium 4"

    Relevant Pages

    • Re: asm in C
      ... The instructions section contains instructions, ... "D" is register edi for Intel 386. ... mov "from", %esi ...
      (alt.lang.asm)
    • Re: Calculating checksums...
      ... MOV BYTE PTR, ... No information that a register or which register is decremented and ... dbeq.l r2,label (decrement & branch equal) ...
      (alt.lang.asm)
    • 8-Bit Register on Pentium 4
      ... I wrote a short routine which it uses only 8-Bit register because I ... I did review Pentium 4 Optimization manual which it says 8-Bit register ... MOV AH, BYTE PTR ...
      (comp.lang.asm.x86)
    • Re: 8-Bit Register on Pentium 4
      ... MOVZX EBX, WORD PTR ... ADD EBX, EAX ... MOV WORD PTR, BX ... use MOVZX or MOVSX to load 8-Bit or 16-Bit variable into 32-Bit register. ...
      (comp.lang.asm.x86)
    • Re: Help wanted with CP/M compiler support
      ... that the Z80 doesn't have; ... there are 'mov h, m' and 'mov l, m' ... ...which is only one byte more than loading one of the other register pairs. ... they did do some real indexed indirect instructions ...
      (comp.os.cpm)