Re: compiler generated output
- From: Spiro Trikaliotis <spamtrap@xxxxxxxxxx>
- Date: Mon, 24 Oct 2005 17:34:05 +0000 (UTC)
Hallo Skarmander,
Skarmander <spamtrap@xxxxxxxxxx> schrieb:
> Spiro Trikaliotis wrote:
>> Skarmander <invalid@xxxxxxxxxxxxxx> wrote:
>>
>
> Not my part of the text. This is Mark F. Haigh's post.
Oh, I'm sorry. I was not aware I stripped the attribution to him, too.
Anyway, from the number of ">", it was clear that this was not your
text, wasn't it?
>> Now, you are comparing something very weird. MSVC++ 6.0 is some years
>> older than "last weeks gcc CVS".
>>
>
> The original discussion wasn't comparing VC 6 with the latest gcc;
> rather it was about whether the code generated by VC 6 in this case was
> acceptable. One poster in comp.lang.c thought it was, quote, "crappy".
> :-) I challenged him to show that improvement was actually possible, and
> he (or rather gcc) came up with this.
Yes, this is true. Anyway, this resulted in a comparison of both
compilers, thus, my above statement still stands.
> Don't use default settings, optimize for speed; and try to tell the
> compiler to optimize for 686 or higher (but using the 80386 instruction
> set only). That's what gcc was asked to do as well.
Ok, I tried again, telling to optimized for speed (/Ot) and generate
code for PPro, P-II, P-III (/G6; But: I don't know how to restrict the
compiler to generate code which only uses 386 instruction). The
resulting code is (for % 8, NOT for %8u):
test!modTest1:
01001be0 8bff mov edi,edi
01001be2 55 push ebp
01001be3 8bec mov ebp,esp
01001be5 0fb74508 movzx eax,word ptr [ebp+0x8]
01001be9 0fb74d0c movzx ecx,word ptr [ebp+0xc]
01001bed 2bc1 sub eax,ecx
01001bef 2507000080 and eax,0x80000007
01001bf4 7905 jns test!modTest1+0x1b (01001bfb)
01001bf6 48 dec eax
01001bf7 83c8f8 or eax,0xfffffff8
01001bfa 40 inc eax
01001bfb 5d pop ebp
01001bfc c20800 ret 0x8
Thus, while replacing mov/and with movzx, the jump is still there. (The
same code is generated for P-IV/Athlon).
Throwing in some more variants: ;)
Interestingly, the code for AMD64 is different (and not only because of
the other register sizes):
test!modTest1:
00000001`00001c60 6689542410 mov [rsp+0x10],dx
00000001`00001c65 66894c2408 mov [rsp+0x8],cx
00000001`00001c6a 0fb7442408 movzx eax,word ptr [rsp+0x8]
00000001`00001c6f 0fb74c2410 movzx ecx,word ptr [rsp+0x10]
00000001`00001c74 2bc1 sub eax,ecx
00000001`00001c76 99 cdq
00000001`00001c77 83e207 and edx,0x7
00000001`00001c7a 03c2 add eax,edx
00000001`00001c7c 83e007 and eax,0x7
00000001`00001c7f 2bc2 sub eax,edx
00000001`00001c81 c3 ret
Thus, it generates "the same kind of code" it generates for x86 in the
case I generate mod 8u, not 8!
Now, the code for mod 8u looks like:
test!modTest2:
00000001`00001c90 6689542410 mov [rsp+0x10],dx
00000001`00001c95 66894c2408 mov [rsp+0x8],cx
00000001`00001c9a 0fb7442408 movzx eax,word ptr [rsp+0x8]
00000001`00001c9f 0fb74c2410 movzx ecx,word ptr [rsp+0x10]
00000001`00001ca4 2bc1 sub eax,ecx
00000001`00001ca6 33d2 xor edx,edx
00000001`00001ca8 b908000000 mov ecx,0x8
00000001`00001cad f7f1 div ecx
00000001`00001caf 8bc2 mov eax,edx
00000001`00001cb1 c3 ret
This totally confuses me. Here, it generates the same kind of code it
generates for x86 in the case "mod 8". (The code for modTest3 is almost
the same, only adding
00000001`0000xxxx 0fb7c0 movzx eax,ax
before the "xor edx,edx" line.)
Thus, to summarize:
x86 AMD64
mod 8 DIV, no jump AND, but with jump
mod 8u AND, but jump DIV, no jump used
> When optimizing for speed, I seriously doubt any compiler would use lame
> ducks like cdq and idiv, let alone for dividing by a constant.
Well, the MS compiler used them.
Anyway, is a DIV really so costly? I always thought that newer
processors recognize if I try to divide by a power of two and are really
fast in this case.
>> Now, this looks much better than the MSVC++ 6.0 code, doesn't it?
>>
> Since you're compiling code with different semantics, a comparison is
> not meaningful.
I meant the overall generated code, thus, I tried to compare everything
with the appropriate "counterpart".
But you were right, the optimization settings should have been changed.
> Do tell me what VC produces for modTest1 when optimizing for speed, and
> indicating a particular architecture.
I have done here. :)
Regards,
Spiro.
--
Spiro R. Trikaliotis http://cbm4win.sf.net/
http://www.trikaliotis.net/ http://www.viceteam.org/
.
- Follow-Ups:
- Re: compiler generated output
- From: Gerd Isenberg
- Re: compiler generated output
- From: Skarmander
- Re: compiler generated output
- From: Skarmander
- Re: compiler generated output
- References:
- Re: compiler generated output
- From: Skarmander
- Re: compiler generated output
- Prev by Date: Re: How to turn off battery low power beep in ASM??
- Next by Date: Re: How to turn off battery low power beep in ASM??
- Previous by thread: Re: compiler generated output
- Next by thread: Re: compiler generated output
- Index(es):
Relevant Pages
|