Re: Opinions on PGI vs. Lahey Fortran

From: Andy Nelson (andy_at_fake.lsu.edu)
Date: 01/03/05


Date: Mon, 3 Jan 2005 20:18:59 +0000 (UTC)

Richard E Maine <nospam@see.signature> wrote:
> In article <crbsml$60t$1@emac1.ocs.lsu.edu>,
> Andy Nelson <andy@fake.lsu.edu> wrote:
>
>> a loop like this:
>>
>> do i=1,n
>>
>>
>> would continue indefinitely past its uppper bound (n) until
>> it hit a segmentation violation or other similarly fatal error.
>
> There are some situations, discussed here in the past, where
> that kind of problem is common with many compilers. In
> particular, there are potential problems if n is the
> largest positive value for the kind of i. If that's the
> problem, I wouldn't consider it a bug; perhaps less than
> ideal behavior, but not a bug. (And, of course, I'm
> assuming that the problem doesn't relate to illegalities
> such as changing the value of i inside of the loop).
>
> Or maybe you hit an actual bug; I'm sure that some of
> the ones I hit were bugs. Just thought I'd mention the other
> possibilities for balance.
>

I know and appreciate your thoroughness, Richard, from your
long history on this group, including some responses to me as
well. Thanks for that :-)

In this case, it was definitely a compiler bug. I'm pretty
suspicious of any declaration (by me or anyone else) that "Its
the compilers fault dammit!", since usually when that has
happened in my case or in my experience it wasn't in the end.
This one occured in more than just one language supported by
PGF (iirc a colleague found the same bug in some c/c++ code he
had--I think it was c or c++ anyway).

The code in question had been tested and worked correctly
on at least 1/2 a dozen different compilers/platforms. The
value of n was constant everywhere. Not as a parameter,
but it is read in at the beginning of a run, then never
changed. It is declared as default integer, and its value
was about 1 million or so (different runs have different
numbers of particles, n). It lives in a module (as opposed
to a common block). I could print out its value immediately
before the loop and it was correct. It happened only with
the OpenMP flags turned on and there were no issues of
shared vs private use of the variable in that sense--the
loop was declared in the omp statement as default(none)
and the variable as shared(n).
 
My guess at the time was that the problem was in the
calculation of the loop iterations and distribution between
processors that was done, but as I wrote before, I didn't
chase it and would consider it a difficult thing to do
from my position as a writer of the code, not the compiler.

Cheers,

Andy

-- 
Andy Nelson                     Dept of Physics and Astronomy
andy@fake.lsu.edu               Louisiana State University
http://www.maths.ed.ac.uk/~andy Baton Rouge Louisiana 70803