Re: Cannot optimize 64bit Linux code




"Bartc" <bc@xxxxxxxxxx> wrote in message news:6tIxj.15225$XI.2979@xxxxxxxxxxxxxxxxxxxxxxxxxxxx

<legrape@xxxxxxxxx> wrote in message
news:83f5f291-4c86-48f6-8625-5ead760a46bf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I am porting a piece of C code to 64bit on Linux. I am using 64bit
integers. It is a floating point intensive code and when I compile
(gcc) on 64 bit machine, I don't see any runtime improvement when
optimizing -O3. If I construct a small program I can get significant
(>4x) speed improvement using -O3 versus -g. If I compile on a 32 bit
machine, it runs 5x faster on the 64 bit machine than does the 64bit
compiled code.

It seems like something is inhibiting the optimization. Someone on
comp.lang.fortran suggested it might be an alignment problem. I am
trying to go through and eliminate all 32 bit integers righ now (this
is a pretty large hunk of code). But thought I would survey this
group, in case it is something naive I am missing.

Any opinion is welcomed. I really need this to run up to speed, and I
need the big address space. Thanks in advance.

Hesitant to attempt an answer as I know nothing about 64-bit or gcc, but..

Does the program compiled in 32-bit mode run faster when compiled with
optimisation than without (or a 32 or 64-bit machine)? In other words, what
scale of improvement are you expecting? (This on the main program)

Is the improvement really likely to be 5x or more? If not, that sounds like
something wrong with the 64-bit-compiled version, forget the optimisation,
if the 32-bit version can run that much faster.


yes, that is a bit harsh...


Do you have the capability to look at a sample of code and see what
exactly is the 64-compiler generating? I doubt it's going to be as silly as
using (and emulating) 128-bit floats, but it does sound like there's
something seriously wrong. It seems unlikely that using int32 instead of
int64 would slow things down 5 times or more.


int32 vs int64, int32 should actually be faster on x86-64 (after all, 32-bit ints have both less-complex instruction encodings, aka, no REX prefix, ..., and also because the core of x86-64 is, after all, still x86...).

as for emulating 128 bit floats, it is conceivably possible. I am aware, in any case, that on x86-64 gcc uses a 128-bit long-double, but whether or not this is an 80-bit float stuffed into a 128 bit space (doing magic of shuffling between SSE regs and the FPU), or whether it uses emulated 128 bit floats, I don't know (I have not investigated gcc's output in this case).

note that SSE does not support 80 bit floats, and the conventions used on x86-64 generally don't use the FPU (it may be used for some calculations, but not much else), so if using long double, it is very possible something funky is going on.

if this is the case, maybe try switching over to double and see if anything is different.


An alignment fault would be a compiler error; but you can print out a few
data addresses and see whether they are on 8/16-byte boundaries or whatever
is recommended.


yes. unless one is using "__attribute__((packed))" everywhere, it should not be a problem...


Is the small program doing anything similar to the big one? It may be
benefiting from smaller instruction/data cache requirements.

You might find that ints/pointers suddenly turn from 32-bits to 64-bits when
compiled on 64-bit (and therefore using twice the memory bandwidth if you
have a lot of them), that might hit some of the performance. You might like
to check the size of pointers, if you don't need 64-bit addressing.



yes, I will agree here...


--
Bart




.



Relevant Pages

  • Re: Denesting
    ... >>of an optimized native-code Forth, write a little code (or just compile ... Because of optimisation. ... but this may adversely affect performance on some IA32 CPUs. ...
    (comp.lang.forth)
  • Re: GNUH8 mixed C and assembly
    ... in C and compile it with the GCC -S switch, so you get the assembler version of the function. ... In other compilers you can see non functional code like timimg loops completely vanish under high levels of optimisation. ... One was to make the counter value a volatile thus insisting that the compiler assume nothing about its use; or to code the loop in assembler, out of the way of the compiler's optimisation. ...
    (comp.arch.embedded)
  • RE: COBOL and CA-Intertest Batch
    ... In the past when managing COBOL build options and Xpediter. ... We were not having performance problems (any we did were ... I was told that optimisation would only gain us about 4% (this was mid to ... > add a final compile with OPT. ...
    (bit.listserv.ibm-main)
  • Re: Cannot optimize 64bit Linux code
    ... It is a floating point intensive code and when I compile ... comp.lang.fortran suggested it might be an alignment problem. ... Hesitant to attempt an answer as I know nothing about 64-bit or gcc, ... something wrong with the 64-bit-compiled version, forget the optimisation, ...
    (comp.lang.c)
  • Re: How do I install this missing library?
    ... you really should be carefully following LSTC's installation ... libg2c is part of gcc. ... to tell a newcomer to compile. ... like the source code to GNU tar, and make sure you understand what's ...
    (comp.os.linux.misc)