Re: Do buffers always start with the lowest memory address being the first element?

From: Chris Torek (nospam_at_torek.net)
Date: 01/15/05

  • Next message: Joe Wright: "Re: Manipulation of strings: upper/lower case"
    Date: 15 Jan 2005 19:07:30 GMT
    
    

    In article <1105812305.778104.182330@c13g2000cwb.googlegroups.com>
    kiru.sengal@gmail.com <kiru.sengal@gmail.com> wrote:
    >[This post is with regards computers/OSes with stacks that grow down.
    >i386/Unix is one possibility]

    The C standard does not assume a downward-growing stack, nor even
    an upward-growing stack. It merely requires that automatic (local)
    variables behave in a "stack-like manner", if a function is called
    recursively.

    (Google's broken news-posting interface destroyed your indentation.
    I have tried to restore it here.)

    >I have embedded my questions/assumptions in the the following sample
    >code:
    >
    >include <stdio.h>
    >
    >long num;
    >/* allocated a zero-init fixed memory location (.BSS) */

    "BSS" is a system-specific (albeit common) method of implementing
    this; C requires only that the variable exist as long as the program
    runs, and be initialized to zero. Some implementations do the same
    with this as with:

        long num = 0;

    because, e.g., they lack anything similar to the "bss region" found
    on your example system.

    >char *s = "Hello world";
    >/* s allocated a fixed memory location and initialized
    >to address of 'H' (DATA). Meanwhile, the literal string
    >(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

    Surely you mean "read only" :-) In typical Unix-like systems today,
    the string literals are in a "read only data" section (".rodata"
    and the like). C allows but does not require that the array produced
    by the string literal be *physically* read-only, and some
    implementations leave it write-able. The effect of writing on
    elements of the array is undefined: it may trap, it may succeed
    (changing the array), or it may silently fail (no trap but the
    array remains unchanged), in the three "most typical" implementations,
    but as far as Standard C is concerned, *anything* is allowed.
    You -- the programmer -- are simply required not to do this,
    in order for you to make any predictions about the operation of
    your program.

    >short buffer[100];
    >/* buffer allocated in contiguous fixed memory locations (in DATA)
    >where buffer[0] is lowest memory address and buffer[99] is highest
    >memory address */

    As with "num" above, C requires only that the variable exist as long
    as the program runs, and be initialized to zero. On the implementation
    you use, you will find that "buf" is also in the ".bss" section.

    The question of "lower" and "higher" memory addresses suggests to
    me that you are asking about machine-level interpretations, but C
    does not give you direct access to machine-level interpretations.
    Instead, Standard C pastes a (usually quite thin) layer of
    abstraction atop the machine-level. You may compute pointers
    pointing into the array named "buffer", e.g.:

        short *p1 = &buffer[20];
        short *p2 = &buffer[75];

    and, given two pointers of compatible type pointing into this array,
    you may compare them using any of the four relational operators:

        int result1 = p1 < p2;
        int result2 = p1 <= p2;
        int result3 = p1 > p2;
        int result4 = p1 >= p2;

    Results 1 and 2 here are guaranteed to be zero if p1 is "not less
    than" p2, and "not less than" means "has a lower subscript in the
    array". In that sense, the elements of the array are indeed
    addressed from lowest to highest.

    (You can also, of course, use the equality operators: p1 == p2,
    p1 != p2. I mention them separately because you can use them in
    places you may *not* use the relational operators.)

    There is nothing stopping an actual C compiler on real hardware
    from putting buffer[0] at the hardware's highest physical memory
    address, and working down towards lower addresses. In this case,
    a C source expression like:

        p1 < p2

    might compile into a machine instruction that tests instead whether
    p1 is greater than p2. (Practically speaking, this would be stupid
    on today's hardware, and no one will do it, because it would also
    require negating the index in expressions like buffer[i]. But one
    might do this on a machine in which ordinary array indexing works
    by subtraction instead of addition -- and such machines have existed
    in the past.)

    At the C code level, then, it *is* the case that buffer[0] through
    buffer[99] are in contiguous memory locations starting "at the
    bottom" and "moving up", but there is no requirement that they be
    physically contiguous (consider virtual-memory systems with small
    page sizes), nor that a C-code level test like "p1 < p2" compile
    to a machine instruction testing whether p1 is less than p2. In
    C's abstract model, they are contiguous and ascending; the extent
    to which C's abstract model matches what really happens on the
    machine depends on both the C compiler and the machine.

    >int main()
    >{
    > int count = 4; /* stored in main()'s stack frame */
    > float fcount= 4.0; /* stored in main()'s stack frame */
    > static confusion = 8; /* stored in same region as buffer (DATA) */

    Again, the C standard requires only that count and fcount work "as
    if" they were on some kind of stack, and that "confusion" work "as
    if" it were in that kind of data-segment: initially 8, and valid
    throughout the lifetime of the program.

    > long lbuffer[50];
    > /* stored in main()'s stack frame, but since stack grows down, is the
    > field for buffer[0] still placed in the lowest memory address? */

    Again, everything is "as if". Like count and fcount, lbuffer need
    only exist as long as main() continues to execute, and if some
    other function in your program calls main(), you must get "new
    copies" of the variables, preserving the old copies that are still
    around because the earlier call to main() is also still around (but
    suspended until this copy of main() returns).

    The direction of stack growth, if there is even a single stack[%]
    that has a single growth direction, is irrelevant to C's abstract
    model. C promises only that &lbuffer[0] < &lbuffer[1] and so on.
    If you somehow manage to compare &lbuffer[23] relationally to
    &buffer[72], no particular result is required. (The types of the
    two buffers' elements do not match, so this requires a cast, which
    potentially changes the value, which muddies the issue even more,
    but never mind all that.) On the other hand, equality comparisons,
    after conversion to a suitable type such as "char *" or "void *",
    *are* required to produce "not equal":

        if ((char *)&lbuffer[23] == (char *)&buffer[72])
            abort(); /* never happens */

    > printf("\n%s\n",s);
    >
    > int i; /* stored in main()'s stack frame, but when is memory allocated? */

    Declarations after code are a C99 feature. Quite a few compilers
    do not support this; in C89 you could use a new block:

        printf("%s\n", s);
        {
            int i;
            ...
        }

    C requires only that the program behave "as if" i's lifetime begins
    at its declaration and continues until execution reaches the "}" that
    terminates its scope. In your C99-specific code, that is the final
    close-brace for main(); in the C89 variant, it is the close-brace
    inserted to match the open-brace I added here.

    > for(i=0; i<100; i++)
    > {
    > buffer[i] = i;
    > }
    > return 0;
    >}
    >
    >Additional questions:
    >
    >- Since local variables don't have to be declared at the beginning of a
    >function, during run-time, is space for all local variables to be used
    >in any function allocated when the function is entered, or only when
    >they are used?

    A C compiler can achieve the required behavior by allocating all
    local variables at a function's entry, or by allocating them upon
    reaching their enclosing block or (in C99) their initial definition.
    There are merits and drawbacks to either method; you will find that
    different C compilers choose different approaches.

    >- Since main() is always the starting point for programs, do compilers
    >really put it's local variables in a main()-stackframe or simply place
    >it in fixed memory locations? How are static locals in main()
    >treated?

    In C (but not in C++ -- the languages are really quite different,
    despite some syntactic similarities), you -- the programmer -- are
    allowed to call main() recursively. If you do, it must behave just
    like any other function. This makes it difficult for C compilers
    to weasel out of creating a stack frame in the usual manner on
    typical machines. (The compiler would have to determine that you
    do not in fact call main() recursively; if so, it could rewrite
    all the automatic variables in main() to have static-duration.
    This determination is not all that hard, but such rewriting is also
    not all that profitable -- the program is unlikely to be any faster
    or smaller. So why bother?)

    In both C and C++, it is possible -- via the atexit() function for
    instance -- to do dumb things with variables whose lifetime terminates
    when main() returns. Consider the following broken C code:

        #include <stdio.h>
        #include <stdlib.h>

        static int *p;

        void oops(void) {
            printf("*p = %d\n", *p);
        }

        int main(void) {
            int v = 42;
            atexit(oops);
            p = &v;
            return 0;
        }

    Here, in the C abstract machine, atexit() registers the function
    oops() to run when the program exits. Then we set p to point to
    v, which is local to main(), and then we return from main(),
    destroying the variable v. Now all atexit()-registered functions
    are run, so oops() runs, and attempts to access *p -- but p points
    to a variable whose lifetime has terminated. The effect is undefined:
    the program is allowed to crash, or print 42, or print any other
    number, or indeed do anything, such as post lies about your boss
    to USENET. :-)

    We can fix the program by changing "int v" to "static int v". This
    changes the storage duration from automatic to static, so that only
    one copy of "v" exists no matter how many times we call main()
    recursively (none, in this case), and "v" exists for the lifetime
    of the program. Since the "lifetime of the program" continues
    *past* the return of main(), while atexit() like oops() run, this
    now makes a difference.

    We can also fix the program by removing the call to atexit(), though
    of course this stops the program from printing 42.

    [% There is one quite substantial merit to having at least two
    stacks, one for "control" -- return addresses and the like -- and
    one for "data" such as local variables. In particular, a system
    with two stacks offers the opportunity to debug programs that
    overwrite local arrays. Because the data are in the "Dstack",
    pointed to by the DSP or data-stack-pointer register, while the
    control values are in the "Cstack" pointed to by the CSP or
    control-stack-pointer register, and the two stacks are "far apart"
    in memory, writing past your own DSP area clobbers only other DSP
    memory. Breakpoints set via the CSP still cause the program to
    stop where you want, and you can then observe the DSP corruption.

    This design also interferes with typical Microsoft-bug-exploits:
    buffer overruns no longer allow you to overwrite the return address.
    The distance between CSP and DSP can be randomized on each run of
    the program, as well.]

    -- 
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
    email: forget about it   http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    

  • Next message: Joe Wright: "Re: Manipulation of strings: upper/lower case"

    Relevant Pages

    • Re: Do buffers always start with the lowest memory address being the first element?
      ... > The C standard does not assume a downward-growing stack, ... > an upward-growing stack. ... C allows but does not require that the array produced ... > machine depends on both the C compiler and the machine. ...
      (comp.lang.c)
    • Re: Checking the Memory regions
      ... like nm that let you see what the compiler has done with your ... The stack grows down, ... If its just below the "locals at" ... Have you checked on all of the processors Linux have been ported to? ...
      (comp.lang.c)
    • Re: Checking the Memory regions
      ... like nm that let you see what the compiler has done with your ... The stack grows down, ... If its just below the "locals at" ... Have you checked on all of the processors Linux have been ported to? ...
      (comp.lang.c)
    • Re: Checking the Memory regions
      ... like nm that let you see what the compiler has done with your ... The stack grows down, ... If its just below the "locals at" ... Have you checked on all of the processors Linux have been ported to? ...
      (comp.lang.c)
    • Re: Interview question !!!
      ... Show me a compiler and platform with a stack where this ... variable, pretending it is a pointer to an array of such variables, ... A compiler that sees that fnever modifies x after it is ...
      (comp.lang.c)