Re: 8-Bit Register on Pentium 4
From: Bryan Parkoff (spamtrap_at_crayne.org)
Date: 03/21/05
- Next message: Sprunk, Stephen: "Re: 8-Bit Register on Pentium 4"
- Previous message: Robert Redelmeier: "Re: simple question, in theory..."
- In reply to: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Next in thread: Grumble: "Re: 8-Bit Register on Pentium 4"
- Reply: Grumble: "Re: 8-Bit Register on Pentium 4"
- Reply: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 21 Mar 2005 02:16:34 +0000 (UTC)
> Or is this the case, that you're getting the code working on a P4, then
> porting to another architecture? eg Z80 or 6502. (FYI: if you already
> didn't know, most other architectures are nothing like the x86).
You guessed right that older processors like Z80, 6502, or 68000 (680xx
or later) do not have MOVZX instruction nor MOVSX instruction. It is not
necessary for 8086 to introduce CBW instruction because it is already had
one available when 8086 was released in 1980s. Please mention if IA-64 has
one.
I believe that Pentium 4 manual is not accurate from www.intel.com.
Please take a look at another website:
http://www.agner.org/assem/pentopt.pdf. It seems to be very accurate.
Please look at Pentium 4 Optimization on page 76. Do you think that it is
very important for me to follow Pentium 4 Optimization rather than Pentium
Pro through Pentium 3 Optimization? Do you encourage me to ignore reading
all except Pentium 4 Optimization?
It is difficult for me to understand how x86 processor work because I
want to write instructions using MASM 6.15 that the speed is very critical
for my project. I have written my project using Microsoft Visual C++
Compiler and Intel C++ Compiler, but I realize that it has some instructions
there which it is not what I wanted because it seems unneeded instructions.
I port from C++ compiled *.obj and *.asm (C++ Compiler generates *.asm
according to the compiler's option) into MASM 6.15 before I remove unneeded
instructions. It helps to reduce instructions and increase the speed for my
project.
It is the same code what we discussed earlier. Do you think that
Pentium 4 Optimization may be the excellent documentation than what Intel
provides their own manual?
I am surprised that Microsoft no longer revise MASM 6.15 by adding long
filename support for Windows XP, but it is only available to Visual Studio
6.0 and .NET. Also, I am surprised that IA-64 does not have assembler
software like MASM. I have no idea if Microsoft is going to develop MASM
for IA-64. I wonder how Microsoft develops Windows software on IA-64. I
assume that they might use Intel C++ Compiler for IA-64. Didn't you think?
Bryan Parkoff
>
> IMHO, grok one architecture first rather than relying on commonalities
> between all architectures to learn asm. Asm is entirely dependent on the
> individual architecture.
>
>>> A partial register stall occurs on a false dependency. The above is a
>>> true dependency, thus a stall (if one was to occur) is necessary.
>> I will have to check Pentium 4 Optimization manual because it says to
>> avoid AH, BH, CH, and DH registers because they are slow.
>
> Instructions using AH .. DH are no slower than instructions using AL .. DL
> at the single instruction level? It's only the dependency issues (which we
> have been discussing) that cause the problems.
>
>> If you want to use 8-Bit register, you have to use AL, BL, CL, and DL
>> register, but not AH, BH, CH, and DH registers. I try to understand what
>> "false dependency" means. Look at below.
>>
>> MOV AX, 02001H
>> ADD AL, 010H
>> ADD AH, 01H
>> MOV WORD PTR [TEMP_DATA], AX
>>
> Line 1 - start of code.
>
> Line 2 has a TRUE dependency on line 1. (The state of AL effects AX)
>
> Line 3 has a FALSE dependency on line 2. (The state of AH does NOT affect
> AL, but the P4 will serialise the instruction stream within the core so
> that Line 3 executes once Line 2 finishes, as it treats the state of AH
> does affect the state of AL).
>
> Line 4 has a TRUE dependency on line 3. (The state of AX is effected by
> the state of AH, which is modified in Line 3).
>
>> Is it considered false dependency, but it might have partial register
>> stall so Pentium 4 does not care?
>>
>> MOV AL, 01H
>> MOV AH, 020H
>> ADD AL, 010H
>> ADD AH, 01H
>> MOV BYTE PTR [TEMP_LOW], AL
>> MOV BYTE PTR [TEMP_HIGH], AH
>>
>> Is it considered true dependency because AL register and AH register
>> use in the same EAX register? Do you recommend to replace from AH to CL
>> register so it would be false dependency?
>
> You have it around the wrong way. ;) It is considered a false dependency,
> since the state of AH does NOT effect the state of AL, but the P4 will see
> them as dependent since they are both from the same register (but as far
> as required execution is concerned, they are separate).
>
> The P4 will see all instructions as dependent on each other, and will
> serialise all instructions into a single instruction stream. In the above
> example, I see two instruction streams (one based on AH and another on
> AL), but the P4 will only see one. By replacing AH or AL with CL will
> break what the P4 sees as dependencies (which don't exist), and will
> create 2 parallel instruction streams to be feed to the core (which is
> what we want to happen).
>
>>> Andreas Kaiser said below.
>>>
>>> Because there are not xH operations internally. Data has to be shifted
>>> into the right place, calculated, then shifted back again. Needless to
>>> say this is not exactly fast.
>>
>> I am not sure I understand what he meant.
>
> IIRC a bit mask is used rather than shifting as Andreas indicated, but all
> this is done at the microcode level within the core itself? (This is
> beyond what you or I see, as this is all the super-secret Intel only knows
> type stuff, that happens within the actual core)
>
> Basic theory is, if you are operating only on a subset of bits with the
> register, then you mask off the other bits and only perform the operation
> on the active bits left over?
>
> --
> Darran (aka Chewy509)...
>
- Next message: Sprunk, Stephen: "Re: 8-Bit Register on Pentium 4"
- Previous message: Robert Redelmeier: "Re: simple question, in theory..."
- In reply to: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Next in thread: Grumble: "Re: 8-Bit Register on Pentium 4"
- Reply: Grumble: "Re: 8-Bit Register on Pentium 4"
- Reply: Chewy509: "Re: 8-Bit Register on Pentium 4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|