Re: 8-Bit Register on Pentium 4
From: Bryan Parkoff (spamtrap_at_crayne.org)
Date: 03/20/05
- Previous message: Tim Roberts : "Re: simple question, in theory..."
- In reply to: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Next in thread: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Reply: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 20 Mar 2005 05:59:56 +0000 (UTC)
>>> movzx ebx, word [Low_Byte] ; Load our 16bit value
>>> movzx eax, byte [_Offset] ; Load our 8bit value
>>> mov dl, 07fh
>>> add ebx, eax ;add them together (ebx has result)
>>> cmp dl, al ;see is _Offset is larger than 07fh
>>> setc cl ; carry is larger
>>> sub bh, cl ; sub the carry flag
>>> mov [Low_Byte], bx ; mov into memory
> Why didn't you just say that you wanted to add a signed 8bit value to an
> unsigned 16bit value?
> Why didn't you just say that you wanted to add a signed 8bit value to an
> unsigned 16bit value?
You wrote your code above, but I wrote my code below like we talked
earlier.
MOV AL, BYTE PTR [Low_Byte]
MOV AH, BYTE PTR [High_Byte]
MOV CL, BYTE PTR [_Offset]
ADD AL, CL
ADC AH, 0H
MOV CH, 07FH
CMP CH, CL
SBB CH, CH
ADD AH, CH
MOV BYTE PTR [Low_Byte], AL
MOV BYTE PTR [High_Byte], AH
It is fine, but I worry that other processors may not have the same
Intel's instructions such as CBW instruction, MOVZX instruction, and/or
MOVSX instruction. My code above is only the option for other processors
that they do not have MOVZX instruction nor MOVSX instruction. It is why
CMP 07FH, [Offset] is needed on other processors. It looks like that other
processors require more instructions which may take more bytes, but Intel
already has all instructions available so it has less bytes. I believe that
Intel can still use 2 or 4 instructions that they may be faster than other
processors which other processors take 9 instructions or more.
I am not sure about IA-64 if IA-64 has instructions, but I realize that
it has removed a lot of instructions which they are not needed.
First Example:
>> MOVZX EBX, WORD PTR [Low_Byte2]
>> MOVSX EAX, BYTE PTR [_Offset]
>> ADD EBX, EAX
>> MOV WORD PTR [Low_Byte2], BX
Second Example:
>> MOVSX EAX, BYTE PTR [_Offset]
>> ADD WORD PTR [Low_Byte2], AX
You quoted earlier that second example can use 16-Bit or 32-Bit which it
can only modify in the memory rather than register. It seems to be fine.
Please state your opinion. Do you think that second example is better than
first example because variable can be modified once using ADD mem, reg? If
it is to modify variable twice or more, first example would be the option
because it has ADD reg, reg rather than ADD mem, reg because registers can
be modified more than twice or more which they are faster.
>> Thank you very much for the information. I think that you should take
>> a look at your code that you wrote above. Didn't you think that EBX is
>> still partial register stall? It is because you modified BH while BX is
>> already present. Other way would be better to my code below.
>
> A partial register stall occurs on a false dependency. The above is a true
> dependency, thus a stall (if one was to occur) is necessary.
I will have to check Pentium 4 Optimization manual because it says to
avoid AH, BH, CH, and DH registers because they are slow. If you want to
use 8-Bit register, you have to use AL, BL, CL, and DL register, but not AH,
BH, CH, and DH registers. I try to understand what "false dependency"
means. Look at below.
MOV AX, 02001H
ADD AL, 010H
ADD AH, 01H
MOV WORD PTR [TEMP_DATA], AX
Is it considered false dependency, but it might have partial register
stall so Pentium 4 does not care?
MOV AL, 01H
MOV AH, 020H
ADD AL, 010H
ADD AH, 01H
MOV BYTE PTR [TEMP_LOW], AL
MOV BYTE PTR [TEMP_HIGH], AH
Is it considered true dependency because AL register and AH register use
in the same EAX register? Do you recommend to replace from AH to CL
register so it would be false dependency?
>> I break 16-Bit into two 8-Bit. Like this below.
>>
>> MOV EAX, 02001H
>> ADD .....Do something
>> MOV BYTE PTR [Low_Byte], AL
>> MOV BYTE PTR [High_Byte], AH
>>
>> Notice "MOV BYTE PTR [High_Byte], AH? Do you think that it is ok to
>> use AH, BH, CH, or DH register to move data back to variable memory
>> because it should always be avoided?
>
> Do you have the exact page in the optimisation manual that says to avoid
> ah .. dh? If using ah .. dh produces the cleanest code, then why not?
>
> You should know about partial register usage, but don't let it get in the
> way of clean and easy to read code. (Partial register usage on the P4 has
> NO penalty in itself, however you avoid using partial registers to avoid
> false dependencies within the code stream, which effects how the OOE
> engine works, which can lead to unnecessary stalls).
>
>> You claim that if it modifies AH, BH, CH, or DH register, it would be
>> required to move it to AL and then modify before move it back to AH. It
>> seems that AH remains unmodified so it can be stored back to variable
>> memory. Please advise.
>
> When did I say that, or anything to that effect?
Andreas Kaiser said below.
Because there are not xH operations internally. Data has to be shifted
into the right place, calculated, then shifted back again. Needless to
say this is not exactly fast.
I am not sure I understand what he meant.
- Previous message: Tim Roberts : "Re: simple question, in theory..."
- In reply to: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Next in thread: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Reply: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|