MOVZX has stall register
From: Bryan Parkoff (spamtrap_at_crayne.org)
Date: 08/25/04
- Previous message: Bryan Parkoff: "JMP Table Exceeds 64K Limitation"
- Next in thread: KVP: "Re: MOVZX has stall register"
- Reply: KVP: "Re: MOVZX has stall register"
- Reply: wolfgang kern: "Re: MOVZX has stall register"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 25 Aug 2004 04:13:06 +0000 (UTC)
It is very interesting that I want to mention why MOVZX would have
stall
register. I have been repeating the same thing before, but it might
tell
you the difference. Look at my example below.
Example 1:
MOV BL, 041H
MOVZX EAX, BL
ADD AL, 02H
MOV ECX, EAX
I suspect that MOVZX does not clear upper bits because it never
uses XOR
EAX, EAX. It looks like MOV EAX, 0H before MOV AL, BL instead of
MOVZX EAX,
BL. It makes the processor into thinking that this register is truly
32
bits. 32 bits register is filled before 8 bits register or 16 bits
register
are modified. It causes to have stall register. The only option is
to use
32 bits register plus it has to use AND instruction to mask only 8
bits
register.
MOVZX has 3 cycles in register to register and 6 cycles in memory
to
register on 386 and 486. They are very slow. I am glad that Pentium
Pro
through Pentium IV are improved. MOVZX is no longer to be slow and
also it
does not take 3-6 cycles anymore. It is now taking 1 uops!!!
Example 2:
XOR EAX, EAX
MOV AL, 0FCH
ADD AL, 02H
ADD AX, 0200H
SHL AX, 01H
MOV ECX, EAX
Example 3:
XOR EAX, EAX
MOV AL, 0FCH
ADD AL, 02H
ADD EAX, 0200H
SHL EAX, 01H
MOV ECX, EAX
Pentium III does not have stall register on Example 3, but it does
on
Example 2. I am shocked that Pentium IV does not have stall register
on
both Example 2 and Example 3. It is very strange. The processor rule
is
that 8 bits register can be modified before 32 bits can be modified
after
upper bits are cleared using XOR EAX, EAX.
You are absolute correct that 8 bits register and 32 bits register
can't
be mixed however 32 bits register is like to link to 8 bits register.
Do
you recommend that 8 bits register should be modified before it can be
moved
to another 32 bits register by avoiding to modifying 32 bits register.
Then
32 bits register can be modified. Look at revised Example 3 below.
Example 3 revised:
XOR EAX, EAX
MOV AL, 0FCH
ADD AL, 02H
MOV ECX, EAX
ADD ECX, 0200H
SHL ECX, 01H
MOV EDX, ECX
Do you think that it is safer than original Example 3? It is very
painful for Pentium IV to avoid XOR EAX, EAX because it does not WANT
us to
use 8 bits register and 16 bits register. It encourages us to always
use 32
bits register like MOVZX. It is true that 8 bits and 16 bits
instructions
take two uops while 32 bits instructions take one uops.
Why do Intel recommend x86 assembly programmers to use MOVZX
instruction
to work with 8 bits register and 16 bits register. Also, Intel tells
them
to always use AND instruction to mask 8 bits register and 16 bits
register.
It will only save one or two uops.
Are you upset that you are not allowed to use 8 bits instructions
and 16
bits instructions otherwise it can degrade performance?
Do you recommend that all variables should be 32 bits using MOV
instruction instead of MOVZX instruction? Should we always use AND
instruction to mask only 8 bits or 16 bits otherwise MOVZX EAX, AL and
MOVZX EAX, AX are the option to replace AND instruction.
It is like that Pentium 4 rejects 8 Bits instruction and 16 Bits
instructions. Intel encourage us to stop using them, but always use
32 Bits instructions. What do you think?
-- Bryan Parkoff
- Previous message: Bryan Parkoff: "JMP Table Exceeds 64K Limitation"
- Next in thread: KVP: "Re: MOVZX has stall register"
- Reply: KVP: "Re: MOVZX has stall register"
- Reply: wolfgang kern: "Re: MOVZX has stall register"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|