Over 100 MOVs

From: Bryan Parkoff (bryan.nospam.parkoff_at_nospam.com)
Date: 01/19/04


Date: Mon, 19 Jan 2004 21:07:51 +0000 (UTC)


    I realize that MOV register to register, MOV IMM to register, and MOV
memory to register are all one cycles except ADD, SUB, AND, OR, XOR, etc
that have two cycles and three cycles.
    Is it practice to use MOV memory to register before use register to
register for ADD, SUB, etc that will be gaining great performance unless it
has too many MOVs than ADD, SUB, AND, OR, XOR, etc.

First Example
mov eax, dword ptr [Var1] -- 1 Cycle
mov ecx, 1 -- 1 Cycle
add eax, ecx -- 1 Cycle
mov dword ptr [Var1], eax -- 1 Cycle
Total -- 4 Cycles

Second Example
add dword ptr [Var1], 1 -- 3 Cycle
Total -- 3 Cycles

    How can you decide to use first example or second example? How can you
judge yourself which example is best for speed? I am advised that I should
try both examples and test on my PC to see which is faster.

    Do MOV's BYTE, WORD, and DWORD are really one cycle?

First Example
mov eax, 0440068H -- One Cycle
mov ebx, dword ptr [eax] -- One Cycle

Second Example
mov eax, 0440068H -- One Cycle
mov edx, 0FFFEH
mov bl, byte ptr [eax+edx] -- One Cycle
add dx, 1H -- FFFE + 1 = FFFF
mov bh, byte ptr [eax+edx] -- One Cycle
add dx, 1H -- FFFF + 1 = 0000
mov cl, byte ptr [eax+edx] -- One Cycle
add dx, 1H -- 0000 + 1 = 0001
mov ch, byte ptr [eax+edx] -- One Cycle

    Notice -- First example has total two cycles and Second example has
total five cycles. Four "mov mem32 to reg8" are in parallel that equals to
one cycle instead of 4 cycles. They are the exact same as "mov mem32 to
reg32" that is one cycle, but it is not in parallel.
    Is my information correct?

    The reason is that I prefer to use "mov mem32 to reg8" instead of "mov
mem32 to reg32". What happen that segment can only limit to 64KB. FFFEH is
first byte, FFFFH is second byte, 0000H is third byte, and 0001H is fourth
byte. If you use "mov mem32 to reg32", it will show FFFEH is first byte,
FFFFH is second byte, 10000H is third byte, and 10001H is fourth byte that
it is beyond 64KB limit. Illegal exception will crash and is forced to
close. Please tell what you think that my code is good enough.

    I use to write C++ code and build Profile for optimization study. I
want to work by writing assembly code. Where can I get the information how
to build Profile for optimization study under assembly language? Please
advise.

-- 
Bryan Parkoff


Relevant Pages