Re: ifort volatile with O3



glen herrmannsfeldt wrote:

Dick Hendrickson <dick.hendrickson@xxxxxxx> wrote:
(snip, I wrote)

Another thought, given that this is regarding a loop, and that
optimizing compilers like to optimize loops and, especially,
move code out of loops when possible. Other parts of the loop
with variables that are not VOLATILE may get optimized in
non-obvious ways, such that the VOLATILE parts don't do what
one would expect. It might be that all variables in such
loops should be VOLATILE to guard against such effects.
(snip)

Remember, the OP did ask about optimization level 3.
As far back as Fortran H Extended even OPT=2 moved assignments
out of loops, and OPT=3 asked for the best the compiler could do.
(I haven't known any with levels higher than 3.)

Sun Fortran goes up to -O5. Some versions of GCC allowed -O9. You may have
had only mainframe compilers in mind, but the OP did not indicate any
specific system.


Well, one could look at the generated assembly code to see if
there is anything obvious.
(snip)

OK, but say it is:

W = V + 3.14 + V

and W is not VOLATILE. Is there any reason the compiler can't do
the two loads for V, store the result in W, and then use W many
times? Even move the whole statement outside a loop?

Be careful here, this question is dangerously similar to asking
if functions have to be evaluated. Fortunately, Richard is on
vacation, so I can give the correct answer. ;)

I almost forgot about that one...

I don't think the compiler can move the computation of W around
if V is volatile.

OK, I said it wrong. Not move that one, but move the non-volatile
assignments around, including before or after the volatile ones.

In my mind, section 1 of the standard
"clearly" defines the order of statement execution to be
one-after-the-other-in-source-text-order with some exceptions
for GOTO, DO, etc. Once the order of execution is defined, the
compiler is free to move things around so long as the final result
operates "as if" things were done in the defined order. Given
a sequence like
COME FROM SOMEPLACE
A = B
W = V + 3.14 + V
C = D
GO TO 10
I would say the compiler must generate the 2 references to V
after the A = B and before the C = D lines are executed. That's
the only sensible meaning I can think of for the Note
saying "use the most recent definition." "Most recent" has
to mean in terms of the standard defined execution sequence.
If W is not volatile, then I think the computation of W
can be optimized and moved around, kept in a register, or whatever,
so long as the compiler uses the 2 loaded values from the
"proper" execution sequence.

I would agree if any of A, B, C, D, or W, were VOLATILE, and
probably also at lower optimization levels. But common
subexpression elimination goes back to Fortran I, and has
been a favorite of optimizers ever since.

I'm taking a pretty hard line here. Basically, I think
VOLATILE is an optimization killer for anything that tries
to re-use values. Things like loop unrolling should be fine,
although probably pointless. One of the major purposes of
loop unrolling is to turn small blocks of code into big
blocks so the optimizer can reorder memory loads into an
efficient order. VOLATILE effectively prohibits that by
requiring that at least some loads go in a prescribed
order.

So my thought is that only loads of VOLATILE variables
need to go in the prescribed order. Others can be moved
around, especially with -O3.
Why only loads? With VOLATILE, the compiler cannot say to itself, " ... I
know that the value has not changed, and I wrote it into memory earlier, so
I don't have to do a STORE again." The VOLATILE attribute says, "just do it
when I tell you".


-- glen

-- mecej4


.



Relevant Pages

  • Re: i386 nmi_watchdog: Merge check_nmi_watchdog fixes from x86_64
    ... > after the store without volatile it seems a reasonable ... > as we have taken the address earlier so at some point the compiler ... pointer away and will be referring to it later. ... or if the compiler does whole-program optimization and can see ...
    (Linux-Kernel)
  • Re: Support for optimization for dual core proc in C++
    ... Does the C++ in VS2005 allow optimization to parallelize and vectorize ... automatically like the product IntelC++ Compiler Version 10.0 does? ... restructures and optimizes loops to ensure that auto-vectorization, ...
    (microsoft.public.vc.language)
  • Re: Is there a way to flush registers and tell the C compiler to refill?
    ... Because using volatile would prevent some C compilation optimization, ... is nothing to worry about (other than a broken compiler). ... Such a caching ...
    (comp.lang.c)
  • Re: ifort volatile with O3
    ... move code out of loops when possible. ... such that the VOLATILE parts don't do what ... the OP did ask about optimization level 3. ... compiler is free to move things around so long as the final result ...
    (comp.lang.fortran)
  • Support for optimization for dual core proc in C++
    ... Does the C++ in VS2005 allow optimization to parallelize and vectorize loops ... automatically like the product IntelC++ Compiler Version 10.0 does? ...
    (microsoft.public.vc.language)