Re: compiler generated output



Spiro Trikaliotis wrote:
Hallo Skarmander,

Skarmander <spamtrap@xxxxxxxxxx> schrieb:

And that's not my address, it's yours. You or your client seem to have issues with attribution, no? :-D


Spiro Trikaliotis wrote:

<snip>
Throwing in some more variants: ;)

Interestingly, the code for AMD64 is different (and not only because of
the other register sizes):

test!modTest1:
00000001`00001c60 6689542410       mov     [rsp+0x10],dx
00000001`00001c65 66894c2408       mov     [rsp+0x8],cx
00000001`00001c6a 0fb7442408       movzx   eax,word ptr [rsp+0x8]
00000001`00001c6f 0fb74c2410       movzx   ecx,word ptr [rsp+0x10]
00000001`00001c74 2bc1             sub     eax,ecx
00000001`00001c76 99               cdq
00000001`00001c77 83e207           and     edx,0x7
00000001`00001c7a 03c2             add     eax,edx
00000001`00001c7c 83e007           and     eax,0x7
00000001`00001c7f 2bc2             sub     eax,edx
00000001`00001c81 c3               ret

Thus, it generates "the same kind of code" it generates for x86 in the
case I generate mod 8u, not 8!


Something tells me two separate optimizers are at work here. Looks like the "old" processors got left out.


When optimizing for speed, I seriously doubt any compiler would use lame ducks like cdq and idiv, let alone for dividing by a constant.


Well, the MS compiler used them.

And it's still not impressing me. :-)

Anyway, is a DIV really so costly? I always thought that newer
processors recognize if I try to divide by a power of two and are really
fast in this case.


That would be news to me, but it's possible. Even if newer processors do this, it doesn't seem wise to expect it if an alternative is readily available. However, div will take care of the sign without additional cost, so it may actually pay off. Not often, though. This particular case may or may not qualify.


We're now at a point where only benchmarks could prove the definite advantage of one piece of code over another, and we'd need to run them on multiple platforms...

In most cases, it will be true that the code will be "good enough". If this operation were critical, it's small enough to write hand-optimized assembler if this turned out to be really necessary. Of course, comparing compilers is always fun... :-)

S.

.



Relevant Pages