Re: compiler generated output
- From: Skarmander <spamtrap@xxxxxxxxxx>
- Date: Sun, 23 Oct 2005 23:10:43 +0000 (UTC)
Spiro Trikaliotis wrote:
this is cross-posted with follow-up to comp.lang.asm.x86, because I talk about generated and optimized x86 code for a C program, which is rahter OT in comp.lang.c.
Skarmander <invalid@xxxxxxxxxxxxxx> wrote:
Not my part of the text. This is Mark F. Haigh's post.
I suppose. Last week's gcc 4.1 from CVS:
[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental) [...]
[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2 -mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c
modTest: movzwl 8(%esp), %edx movzwl 4(%esp), %eax subl %edx, %eax cltd shrl $29, %edx addl %edx, %eax andl $7, %eax subl %edx, %eax movzwl %ax, %eax ret
Yup, clever. Branchless and pipeline-optimized. Thanks, I didn't see this.
Now, you are comparing something very weird. MSVC++ 6.0 is some years older than "last weeks gcc CVS".
The original discussion wasn't comparing VC 6 with the latest gcc; rather it was about whether the code generated by VC 6 in this case was acceptable. One poster in comp.lang.c thought it was, quote, "crappy". :-) I challenged him to show that improvement was actually possible, and he (or rather gcc) came up with this.
Don't use default settings, optimize for speed; and try to tell the compiler to optimize for 686 or higher (but using the 80386 instruction set only). That's what gcc was asked to do as well.Ok, although it is OT, I did some test with a newer version of the MS compiler (as been available with the latest release DDK, Win 2003 DDK SP 1, with default settings for release builds):
C:\test>cl Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.4035 for 80x86 Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.
Now, let's have a look at the compiled code:
unsigned short modTest1(unsigned short a, unsigned short b) { return ( (a - b) % 8 ); }
test!modTest1:
01001ba7 8bff mov edi,edi
01001ba9 55 push ebp
01001baa 8bec mov ebp,esp
01001bac 0fb74d0c movzx ecx,word ptr [ebp+0xc]
01001bb0 0fb74508 movzx eax,word ptr [ebp+0x8]
01001bb4 2bc1 sub eax,ecx
01001bb6 99 cdq
01001bb7 6a08 push 0x8
01001bb9 59 pop ecx
01001bba f7f9 idiv ecx
01001bbc 668bc2 mov ax,dx
01001bbf 5d pop ebp
01001bc0 c20800 ret 0x8
[...]
When optimizing for speed, I seriously doubt any compiler would use lame ducks like cdq and idiv, let alone for dividing by a constant.
Since you're compiling code with different semantics, a comparison is not meaningful. Note that "(a - b) % 8" is *not* the same thing as "(a - b) % 8u", optimizing or not. The latter can be implemented with code that is indeed quite trivial.
unsigned short modTest2(unsigned short a, unsigned short b) { return ( (a - b) % 8U ); }
test!modTest2: 01001bc8 8bff mov edi,edi 01001bca 55 push ebp 01001bcb 8bec mov ebp,esp 01001bcd 8b4508 mov eax,[ebp+0x8] 01001bd0 8b4d0c mov ecx,[ebp+0xc] 01001bd3 2bc1 sub eax,ecx 01001bd5 83e007 and eax,0x7 01001bd8 5d pop ebp 01001bd9 c20800 ret 0x8 [...]
unsigned short modTest3(unsigned short a, unsigned short b) { return ( (unsigned short)( (unsigned int)(a - b) ) % 8U ); }
test!modTest3: 01001be1 8bff mov edi,edi 01001be3 55 push ebp 01001be4 8bec mov ebp,esp 01001be6 33c0 xor eax,eax 01001be8 8a4508 mov al,[ebp+0x8] 01001beb 2a450c sub al,[ebp+0xc] 01001bee 83e007 and eax,0x7 01001bf1 5d pop ebp 01001bf2 c20800 ret 0x8
(Remark: I was not able to compile modTest2 and modTest3 if they were both available with the main() calling it. The compiler always insisted on optimizing them away. I had to use two different compilation units and link them together to actually get code for them.)
Now, this looks much better than the MSVC++ 6.0 code, doesn't it?
Do tell me what VC produces for modTest1 when optimizing for speed, and indicating a particular architecture.
S.
.
- Follow-Ups:
- Re: compiler generated output
- From: Gerd Isenberg
- Re: compiler generated output
- From: Spiro Trikaliotis
- Re: compiler generated output
- Prev by Date: improve strlen
- Next by Date: Re: improve strlen
- Previous by thread: improve strlen
- Next by thread: Re: compiler generated output
- Index(es):
Relevant Pages
|