Fortran memory allocation (stack/heap) issues

From: Andy Nelson (andy_at_kant.maths.ed.ac.uk)
Date: 04/27/04

  • Next message: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"
    Date: Tue, 27 Apr 2004 20:03:50 +0000 (UTC)
    
    

    Hi folks,

    I have a colleague who is working with some code that I wrote
    parts of and is trying to run it on some other machines on which
    I have never worked (details below). He is currently having some
    troubles running large jobs with it that I've never run into, and
    so I would like to ask folks here, since I don't seem to be able
    to find much specific in the books I've got on my shelf. Hopefully,
    what I write will be clear/complete enough that people can
    respond intelligently. Sorry for any missing bits, and somebody
    please hit me with a clue bat if I'm making no sense.

    These questions also have quite a bit to do with OS architecture
    (which I know much less about) rather than Fortran, but am
    asking here since the code and issues are mostly Fortran
    related. Partly, I'm asking because I'm not sure the advice my
    colleague is getting is accurate from his systems folk who may not
    be specifically Fortran folk. Partly for my own sake of
    understanding how Fortran memory implementation stuff really
    works, both in general and what might be the actual implementation
    on various real machines.

    Thanks for any help,

    Andy

    ---description---

    The code is a particle evolution code with individual timesteps
    for each particle, written in Fortran 77 with the usual sorts
    of extensions like enddo, longer variable names etc. It is also
    parallelized with openmp. We ordinarily compile with f77 if it
    is available, but f90/5 if not (i.e. Sun's f77 is now a wrapper for
    f90 afaik), and are moving towards a more completely f90/5 environment.
    However, the code compiles fine with f77 compilers at the moment, and
    specifically for the problem at heand, there are no quantities that
    are allocated dynamically via f90ish features. All arrays are
    dimensioned at the beginning and stay that way throughout the
    execution of the code. Nearly all are originally declared in some
    sort of common block but are often passed to a subroutine as an
    argument, rather than through their common block. A typical
    subroutine is of order 1-2 screenfuls of code (I can't
    understand much more than that at one time). In some cases a
    subroutine may require substantial cpu time to complete, and in
    others it may pick out just a few elements and do something with
    only them. In either case, the basic structure is a loop
    either over all particles, or over all `active' particles.

    The specific problem I'm wondering about is with how variables
    and arrays are transfered from a caller to a callee in
    the argument of a subroutine, and how they might be allocated
    memory space (heap/stack) if a copy is made. Especially things
    like how to avoid stack overflow issues. The questions/problems
    I have are

    -Under what circumstances would one expect that the call
     might trigger a quantity to have a copyin/copyout sort of
     arrangement and what might trigger a call by reference sort
     of arrangement? (I am fairly certain I don't have the
     terminology correct...but the basic idea I mean is whether
     a copy is made onto some new piece of memory, worked on,
     then copied back out, or whether the original piece of memory
     gets used directly in a subroutine--as I understand it
     Fortran allows either to occur depending on various
     intangibles I would like to know more about).

      From what I can gather from M&R (section 5.7.2), the situations
      where a copy is made (must be made?) don't apply to the code as
      it is written, but I may be incorrect about that, since there seem
      to be some problems related to it that I describe below. (The
      situations M&R describe may of course also apply only to the
      f90/5 standard and not f77...)

    -Under what circumstances would one expect that the copy (if
     one is made) goes to which sort of memory...stack or heap?
     As I understand it, stack memory is far more efficient (fast)
     than heap...how might I (as a code writer) avoid such issues
     in favor of just using some original version or if some copy
     is needed, to make it a copy-to-stack flavor rather than
     copy-to-heap flavor.

    -I recall some discussion on clf a few weeks ago about stack/heap
     issues, but got the feeling then that the answers then were
     strongly problem and machine/compiler dependent. To what extent
     is this also true of the questions that I'm asking as well?

    -There are also occasions when a routine might be called with
     one or the other npart=0 (see code below). As a completely
     separate question, my copy of M&R says zero length arrays are
     allowed, but I've run across compilers (can't remember which)
     where the code crashed on trying to enter a subroutine with such
     a situation, and worked when I changed the array declaration
     to a fixed parameter value. Did I do something weird
     or should I report a bug? (Maybe this description isn't
     specific enough for anyone to say for sure though...)

    Now below there is a bit of hearsay involved below, since I don't
    have direct access to the systems folk who made the original claims.
    Bear with me. The situation is that the code will not run due to
    insufficient stack size, given a problem of some size that is
    still much smaller than the total available memory.

    The machines where problems are found at the moment are a
    Hitachi SR8000 and an Itanium2 machine put together by
    some small more or less no-name company (I never heard of them) and
    using the Intel compiler, but I'm at least as interested in the
    answers in general as compared specifically to one machine, since we
    would like to be able to run the code easily/portably/efficiently
    on lots of different machines and provide it to others as well.

    My colleague says that the systems people where he is running
    (the Rechenzentrum in Garching Germany) have told him that the
    problem is related to the machine stack size and how a lot of
    the subroutine calls are phrased. In a variety of places in
    the code there are subroutine calls like this

           program myprog

           parameter(npart1_max=<somenumber>)
           parameter(npart2_max=<someothernumber>)

           integer iarray(npart1_max) !various of these have
           double precision array(npart2_max) !different npart1 or npart2
           . ! dimensions
           <initialize everything>
           .
           call someroutine(npart1,npart2,array,iarray) !or often routines
           . !with more arguments
           .
           end

           subroutine someroutine(npart1,npart2,array,iarray)

           integer iarray(npart1)

           double precision array(npart2)

           <do work on and/or using `array' and `iarray', possibly
           also referencing still others defined in a common>

           return
           end

    where npart1/2 is a very large number, typically 10**5-10**7, and
    there are many arrays that are passed like this. Other arrays that
    may be used locally in a few of the various subroutines are accessed
    as members of common blocks as well. (These commons are things
    that I'm working on converting into modules, as soon as the
    time for our conversion to f90/5 comes RSN).

    The claim is that the phrasing above will lead to the
    variables getting defined on a stack, and if there are lots
    of such arrays scattered through the code in various calls,
    then the stack can overflow. The recommendation that was made
    was to change the phrasing to this sort of thing:

           subroutine someroutine(npart1,npart2,array,iarray)

           parameter(npart1_max=<somenumber>)
           parameter(npart2_max=<someothernumber>)

           integer iarray(npart1_max)
           double precision(npart2_max)
           .
           .
           .
           return
           end

    ...and in that case the allocation would be on the heap,
    and not cause the stack to overflow problems that we seem
    to have.

    Now the problem/question I have with this is that all of these
    arrays are declared with a fixed length in whatever main routine
    they come from. There are no cases of allocate/free arrays in
    any routines in the whole code. So I'm at least partly confused
    as to why there might be issues of heap vs stack allocation to
    begin with. That lead to this post and my questions above.

    -- 
    Andy Nelson                     School of Mathematics
    andy@maths.ed.ac.uk             University of Edinburgh 
    http://maths.ed.ac.uk/~andy     Edinburgh Scotland EH9 3JZ  U. K.
    

  • Next message: Richard Maine: "Re: Fortran memory allocation (stack/heap) issues"

    Relevant Pages

    • Re: To RISC or not to RISC
      ... I'd think we'd see more stack machines forthcoming. ... simulation with error bounds checking (i.e. bad memory address to write ... difficult to do the equivalent of register renaming and stuff like ...
      (comp.arch)
    • Re: To RISC or not to RISC
      ... I'd think we'd see more stack machines forthcoming. ... stack, the return stack, and main memory to reduce the stack memory ... Stack machines and register machines are ...
      (comp.arch)
    • Re: Criticisms?
      ... justified today with machines 10 000 or more faster than a PDP-11. ... the feature were "built-in" to the language. ... but lean enough that the compiler can still do a good job. ... The interchangeability of arrays and pointers is pure genius (it maps very ...
      (comp.lang.c)
    • Re: Statement on Schildt submitted to wikipedia today
      ... of a stackless machine. ... machines without hardware stacks, and that on these machines, stacks ... had to be implemented in software to handle the runtime of languages ... The IBM 1401 had no stack. ...
      (comp.lang.c)
    • Re: The best computer for running Windows...
      ... For some folk their PC is part of a hobby, ... use their machines with as little fuss as possible. ... really don't have any reliability issues. ... XP is simply the most practical solution for a platform I enjoy ...
      (uk.comp.sys.mac)