Fortran memory allocation (stack/heap) issues
From: Andy Nelson (andy_at_kant.maths.ed.ac.uk)
Date: 04/27/04
- Previous message: ytolim_at_cnet.com: "Voyeur Moments on Temp Island 4444"
- Next in thread: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: glen herrmannsfeldt: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: Roger Williams: "Re: Fortran memory allocation (stack/heap) issues"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 27 Apr 2004 20:03:50 +0000 (UTC)
Hi folks,
I have a colleague who is working with some code that I wrote
parts of and is trying to run it on some other machines on which
I have never worked (details below). He is currently having some
troubles running large jobs with it that I've never run into, and
so I would like to ask folks here, since I don't seem to be able
to find much specific in the books I've got on my shelf. Hopefully,
what I write will be clear/complete enough that people can
respond intelligently. Sorry for any missing bits, and somebody
please hit me with a clue bat if I'm making no sense.
These questions also have quite a bit to do with OS architecture
(which I know much less about) rather than Fortran, but am
asking here since the code and issues are mostly Fortran
related. Partly, I'm asking because I'm not sure the advice my
colleague is getting is accurate from his systems folk who may not
be specifically Fortran folk. Partly for my own sake of
understanding how Fortran memory implementation stuff really
works, both in general and what might be the actual implementation
on various real machines.
Thanks for any help,
Andy
---description---
The code is a particle evolution code with individual timesteps
for each particle, written in Fortran 77 with the usual sorts
of extensions like enddo, longer variable names etc. It is also
parallelized with openmp. We ordinarily compile with f77 if it
is available, but f90/5 if not (i.e. Sun's f77 is now a wrapper for
f90 afaik), and are moving towards a more completely f90/5 environment.
However, the code compiles fine with f77 compilers at the moment, and
specifically for the problem at heand, there are no quantities that
are allocated dynamically via f90ish features. All arrays are
dimensioned at the beginning and stay that way throughout the
execution of the code. Nearly all are originally declared in some
sort of common block but are often passed to a subroutine as an
argument, rather than through their common block. A typical
subroutine is of order 1-2 screenfuls of code (I can't
understand much more than that at one time). In some cases a
subroutine may require substantial cpu time to complete, and in
others it may pick out just a few elements and do something with
only them. In either case, the basic structure is a loop
either over all particles, or over all `active' particles.
The specific problem I'm wondering about is with how variables
and arrays are transfered from a caller to a callee in
the argument of a subroutine, and how they might be allocated
memory space (heap/stack) if a copy is made. Especially things
like how to avoid stack overflow issues. The questions/problems
I have are
-Under what circumstances would one expect that the call
might trigger a quantity to have a copyin/copyout sort of
arrangement and what might trigger a call by reference sort
of arrangement? (I am fairly certain I don't have the
terminology correct...but the basic idea I mean is whether
a copy is made onto some new piece of memory, worked on,
then copied back out, or whether the original piece of memory
gets used directly in a subroutine--as I understand it
Fortran allows either to occur depending on various
intangibles I would like to know more about).
From what I can gather from M&R (section 5.7.2), the situations
where a copy is made (must be made?) don't apply to the code as
it is written, but I may be incorrect about that, since there seem
to be some problems related to it that I describe below. (The
situations M&R describe may of course also apply only to the
f90/5 standard and not f77...)
-Under what circumstances would one expect that the copy (if
one is made) goes to which sort of memory...stack or heap?
As I understand it, stack memory is far more efficient (fast)
than heap...how might I (as a code writer) avoid such issues
in favor of just using some original version or if some copy
is needed, to make it a copy-to-stack flavor rather than
copy-to-heap flavor.
-I recall some discussion on clf a few weeks ago about stack/heap
issues, but got the feeling then that the answers then were
strongly problem and machine/compiler dependent. To what extent
is this also true of the questions that I'm asking as well?
-There are also occasions when a routine might be called with
one or the other npart=0 (see code below). As a completely
separate question, my copy of M&R says zero length arrays are
allowed, but I've run across compilers (can't remember which)
where the code crashed on trying to enter a subroutine with such
a situation, and worked when I changed the array declaration
to a fixed parameter value. Did I do something weird
or should I report a bug? (Maybe this description isn't
specific enough for anyone to say for sure though...)
Now below there is a bit of hearsay involved below, since I don't
have direct access to the systems folk who made the original claims.
Bear with me. The situation is that the code will not run due to
insufficient stack size, given a problem of some size that is
still much smaller than the total available memory.
The machines where problems are found at the moment are a
Hitachi SR8000 and an Itanium2 machine put together by
some small more or less no-name company (I never heard of them) and
using the Intel compiler, but I'm at least as interested in the
answers in general as compared specifically to one machine, since we
would like to be able to run the code easily/portably/efficiently
on lots of different machines and provide it to others as well.
My colleague says that the systems people where he is running
(the Rechenzentrum in Garching Germany) have told him that the
problem is related to the machine stack size and how a lot of
the subroutine calls are phrased. In a variety of places in
the code there are subroutine calls like this
program myprog
parameter(npart1_max=<somenumber>)
parameter(npart2_max=<someothernumber>)
integer iarray(npart1_max) !various of these have
double precision array(npart2_max) !different npart1 or npart2
. ! dimensions
<initialize everything>
.
call someroutine(npart1,npart2,array,iarray) !or often routines
. !with more arguments
.
end
subroutine someroutine(npart1,npart2,array,iarray)
integer iarray(npart1)
double precision array(npart2)
<do work on and/or using `array' and `iarray', possibly
also referencing still others defined in a common>
return
end
where npart1/2 is a very large number, typically 10**5-10**7, and
there are many arrays that are passed like this. Other arrays that
may be used locally in a few of the various subroutines are accessed
as members of common blocks as well. (These commons are things
that I'm working on converting into modules, as soon as the
time for our conversion to f90/5 comes RSN).
The claim is that the phrasing above will lead to the
variables getting defined on a stack, and if there are lots
of such arrays scattered through the code in various calls,
then the stack can overflow. The recommendation that was made
was to change the phrasing to this sort of thing:
subroutine someroutine(npart1,npart2,array,iarray)
parameter(npart1_max=<somenumber>)
parameter(npart2_max=<someothernumber>)
integer iarray(npart1_max)
double precision(npart2_max)
.
.
.
return
end
...and in that case the allocation would be on the heap,
and not cause the stack to overflow problems that we seem
to have.
Now the problem/question I have with this is that all of these
arrays are declared with a fixed length in whatever main routine
they come from. There are no cases of allocate/free arrays in
any routines in the whole code. So I'm at least partly confused
as to why there might be issues of heap vs stack allocation to
begin with. That lead to this post and my questions above.
-- Andy Nelson School of Mathematics andy@maths.ed.ac.uk University of Edinburgh http://maths.ed.ac.uk/~andy Edinburgh Scotland EH9 3JZ U. K.
- Previous message: ytolim_at_cnet.com: "Voyeur Moments on Temp Island 4444"
- Next in thread: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: glen herrmannsfeldt: "Re: Fortran memory allocation (stack/heap) issues"
- Reply: Roger Williams: "Re: Fortran memory allocation (stack/heap) issues"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|