Re: Problem with log function - Intel fortran compiler under Linux
- From: Gordon Sande <g.sande@xxxxxxxxxxxxxxxx>
- Date: Tue, 09 Oct 2007 12:49:04 GMT
On 2007-10-09 03:58:35 -0300, Arjen Markus <arjen.markus@xxxxxxxxxx> said:
On 8 okt, 14:58, Gordon Sande <g.sa...@xxxxxxxxxxxxxxxx> wrote:On 2007-10-08 06:44:15 -0300, Arjen Markus <arjen.mar...@xxxxxxxxxx> said:
On 4 okt, 15:35, Arjen Markus <arjen.mar...@xxxxxxxxxx> wrote:Hello,
we are experiencing a rather nasty problem with the Intel Fortran
compiler under Linux
(version 9.0). The program is big and so far we have not been able to
trim the code
so that a moderately sized program displays the problem too.
The point is this:
The first application of the log function on double precision reals
causes
an NaN (while the argument is a perfectly acceptable value of about
0.6).
If we add a dummy statement like:
aa =log(r)
as one of the first statements, the value of aa becomes NaN and some
later computations
succeed. (NaNs occur at a different positions).
Here is some more information:
Intel Fortran: 9.0
Linux: Red Hat Enterprise Linux, ES release 4.
Has anyone encounter similar problems? Does anyone know how to solve
this problem?
Regards,
Arjen
Hm, things are brightening a bit:
- My original posting reflected the situation with all
sources compiled with debugging on
- I then tried with debugging on and array bound checking
- no error message about possible array bound violation
but the first NaN appears later on
- Without any options (so only the defaults) the first
NaN appears earlier on.
Especially this latter observation means that I have
a lot less code to worry about :)
Regards,
Arjen
It is rather easy to lie about array sizes in a way that confuses
array bounds checking when using F77 (assumed or explicit sizes)
semantics so the lack of out of bounds diagnostics can be misleading.
To check against that you need to use one of the "sturdier" debugging
systems like Salford/Silverfrost that supplies their own descriptors
at all calls. It makes third party compiled libraries "difficult" if
not impossible. (The problem is that the "new" declaration is not checked
against the "old" declaration across the call but the subscript checking
is only against the "new" declaration and so can be easily mislead.
The better systems check against the descriptors but that requires much
more information than the "traditional" Fortran implementation to be
available at the call.)
When you get into this sort of problem it is time to think multiple
compilers, and even other operating systems. Salford is on Windows.
The choice is between bowing to the needs of the other system (i.e.
running under Windows for a while) or continuing to battle with
ineffective tools. It sounds like you have all sources readily
available so using Salford for a while may not be much of a bother.- Tekst uit oorspronkelijk bericht niet weergeven -
- Tekst uit oorspronkelijk bericht weergeven -
Well, the program runs under Windows, using CVF 6.6C (and has
been developed on several platforms over the years), so we
were unhappily surprised to see these nasty problems occur
under Linux.
The all too common falacy that absence of evidence (of a hidden bug)
is evidence of absence. Bugs showing themselves after a change of
compiler is a rather common story. A real pain for the now unhappy
user.
If it already runs under Winows then pushing it through Salford/
Silverfrost should be good for an interesting afernoons work. It
may take more if it has other interesting features that tend to show
up in legacy codes. Addressing those is a good thing in the large
even if a bother today. Having both Windows and Linus version suggests
that you have full source so using yet another compiler is not a big
deal.
To further elaborate on the problem:
The program uses a single REAL array stored in a blank COMMON
block to pass the data around. This REAL array is used as
double precision REAL data and as double precision COMPLEX data
by passing various array elements to the subroutines involved.
(The pieces that arise that way do _NOT_ overlap - I checked)
I had guessed that there was F77 "dynamic" allocation going on.
Instead of "rather easy to lie about bounds" this is making a career
out of the practice. Many subscript checkers will let anything
through for "REAL X(*)" as you just promised that everything
was OK. It takes a fair bit of work at run time to figure out
what the value of "*" should actually be. The combination of
F77 "dynamic" allocation and use of "*" dimensioning renders
subscript checking almost meaningless.
This is _NOT_ the cause of the problem though:
When I replace that REAL array by a double precision array,
the _SAME_ problem occurs.
In the context of your other remarks it would be surprising if
the type of the common had any effect. A possible issue might
be alignment. The machines that I know technical details on tended
to take a bad performance hit on bad alignment but got the right
answer. It is also perfectly reasonable that a misaligned load
could give a wrong answer due to address wrapping rules. The
solution is to never have the problem and then never have to
worry about such details.
My stragegy right now is to radically skip pieces of the
program that seem irrelevant to the computations that show
the problem. I will even try if valgrind offers any insight.
I though ValGrind was a statistical location counter sampler
to help locate performance bottle necks. (I have only looked
at its blurb enough to form the impression that it was not
something that I was interseted in. I like EXACT execution
counts for debugging as well as bottle neck finding. Have rolled
my own to get something I like. Wish I did not have to but that
is how it is.) Why such a tool would be relevant for the problem
you are describing is not clear to me. It might be interesting
once you have cured the problem but I don't ofhand see how it will
help find it.
But the problem is very resilient: I have experimented with
many different changes to the code (explicit declaration of
all variables, using INTRINSIC, changing the type of the array
as described above), and most have no effect on the symptoms.
Adding one write statement does make the NaNs disappear, but
that hardly seems a good solution.
It has all the classical hallmarks of bad calls or bad subscripts.
The cure is a very tough and bloody minded debugging compiler. The
cash price of Salford for personnal use (zero!) is a bargain compared
to the cost of your time.
The are some who will argue that until you fix the bug that mone of your
results are to be trusted. They may look right or they may just look
like what you a become conditioned to expect. As they say, close only
counts in horseshoes (and genades).
Regards,
Arjen
.
- Follow-Ups:
- References:
- Problem with log function - Intel fortran compiler under Linux
- From: Arjen Markus
- Re: Problem with log function - Intel fortran compiler under Linux
- From: Arjen Markus
- Re: Problem with log function - Intel fortran compiler under Linux
- From: Gordon Sande
- Re: Problem with log function - Intel fortran compiler under Linux
- From: Arjen Markus
- Problem with log function - Intel fortran compiler under Linux
- Prev by Date: Re: stack overflow
- Next by Date: LNK2001 error
- Previous by thread: Re: Problem with log function - Intel fortran compiler under Linux
- Next by thread: Re: Problem with log function - Intel fortran compiler under Linux
- Index(es):
Relevant Pages
|