Re: Do buffers always start with the lowest memory address being the first element?
imanpreet_at_gmail.com
Date: 01/17/05
- Next message: Old Wolf: "Re: initializer element is not constant"
- Previous message: bd: "Re: accessing freed memory without error"
- In reply to: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Next in thread: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Reply: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 17 Jan 2005 12:30:01 -0800
[Google seems to have some problem, apologies if this posts multiple
times]
Chris Torek wrote:
> In article <1105812305.778104.182330@c13g2000cwb.googlegroups.com>
> kiru.sengal@gmail.com <kiru.sengal@gmail.com> wrote:
> >[This post is with regards computers/OSes with stacks that grow
down.
> >i386/Unix is one possibility]
>
> The C standard does not assume a downward-growing stack, nor even
> an upward-growing stack. It merely requires that automatic (local)
> variables behave in a "stack-like manner", if a function is called
> recursively.
>
> (Google's broken news-posting interface destroyed your indentation.
> I have tried to restore it here.)
>
> >I have embedded my questions/assumptions in the the following sample
> >code:
> >
> >include <stdio.h>
> >
> >long num;
> >/* allocated a zero-init fixed memory location (.BSS) */
>
> "BSS" is a system-specific (albeit common) method of implementing
> this; C requires only that the variable exist as long as the program
> runs, and be initialized to zero. Some implementations do the same
> with this as with:
>
> long num = 0;
>
> because, e.g., they lack anything similar to the "bss region" found
> on your example system.
>
> >char *s = "Hello world";
> >/* s allocated a fixed memory location and initialized
> >to address of 'H' (DATA). Meanwhile, the literal string
> >(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */
>
> Surely you mean "read only" :-) In typical Unix-like systems today,
> the string literals are in a "read only data" section (".rodata"
> and the like). C allows but does not require that the array produced
> by the string literal be *physically* read-only, and some
> implementations leave it write-able. The effect of writing on
> elements of the array is undefined: it may trap, it may succeed
> (changing the array), or it may silently fail (no trap but the
> array remains unchanged), in the three "most typical"
implementations,
> but as far as Standard C is concerned, *anything* is allowed.
> You -- the programmer -- are simply required not to do this,
> in order for you to make any predictions about the operation of
> your program.
>
> >short buffer[100];
> >/* buffer allocated in contiguous fixed memory locations (in DATA)
> >where buffer[0] is lowest memory address and buffer[99] is highest
> >memory address */
>
> As with "num" above, C requires only that the variable exist as long
> as the program runs, and be initialized to zero. On the
implementation
> you use, you will find that "buf" is also in the ".bss" section.
Nitpick: buffer, not buf
> The question of "lower" and "higher" memory addresses suggests to
> me that you are asking about machine-level interpretations, but C
> does not give you direct access to machine-level interpretations.
> Instead, Standard C pastes a (usually quite thin) layer of
> abstraction atop the machine-level. You may compute pointers
> pointing into the array named "buffer", e.g.:
>
> short *p1 = &buffer[20];
> short *p2 = &buffer[75];
>
> and, given two pointers of compatible type pointing into this array,
> you may compare them using any of the four relational operators:
>
> int result1 = p1 < p2;
> int result2 = p1 <= p2;
> int result3 = p1 > p2;
> int result4 = p1 >= p2;
>
> Results 1 and 2 here are guaranteed to be zero if p1 is "not less
> than" p2, and "not less than" means "has a lower subscript in the
> array". In that sense, the elements of the array are indeed
> addressed from lowest to highest.
>
> (You can also, of course, use the equality operators: p1 == p2,
> p1 != p2. I mention them separately because you can use them in
> places you may *not* use the relational operators.)
>
> There is nothing stopping an actual C compiler on real hardware
> from putting buffer[0] at the hardware's highest physical memory
> address, and working down towards lower addresses. In this case,
> a C source expression like:
>
> p1 < p2
>
> might compile into a machine instruction that tests instead whether
> p1 is greater than p2. (Practically speaking, this would be stupid
> on today's hardware, and no one will do it, because it would also
> require negating the index in expressions like buffer[i]. But one
> might do this on a machine in which ordinary array indexing works
> by subtraction instead of addition -- and such machines have existed
> in the past.)
>
> At the C code level, then, it *is* the case that buffer[0] through
> buffer[99] are in contiguous memory locations starting "at the
> bottom" and "moving up", but there is no requirement that they be
> physically contiguous (consider virtual-memory systems with small
> page sizes), nor that a C-code level test like "p1 < p2" compile
> to a machine instruction testing whether p1 is less than p2. In
> C's abstract model, they are contiguous and ascending; the extent
> to which C's abstract model matches what really happens on the
> machine depends on both the C compiler and the machine.
>
> >int main()
> >{
> > int count = 4; /* stored in main()'s stack frame */
> > float fcount= 4.0; /* stored in main()'s stack frame */
> > static confusion = 8; /* stored in same region as buffer (DATA)
*/
>
> Again, the C standard requires only that count and fcount work "as
> if" they were on some kind of stack, and that "confusion" work "as
> if" it were in that kind of data-segment: initially 8, and valid
> throughout the lifetime of the program.
>
> > long lbuffer[50];
> > /* stored in main()'s stack frame, but since stack grows down,
is the
> > field for buffer[0] still placed in the lowest memory address?
*/
>
> Again, everything is "as if". Like count and fcount, lbuffer need
> only exist as long as main() continues to execute, and if some
> other function in your program calls main(), you must get "new
> copies" of the variables, preserving the old copies that are still
> around because the earlier call to main() is also still around (but
> suspended until this copy of main() returns).
>
> The direction of stack growth, if there is even a single stack[%]
> that has a single growth direction, is irrelevant to C's abstract
> model. C promises only that &lbuffer[0] < &lbuffer[1] and so on.
> If you somehow manage to compare &lbuffer[23] relationally to
> &buffer[72], no particular result is required. (The types of the
> two buffers' elements do not match, so this requires a cast, which
> potentially changes the value, which muddies the issue even more,
> but never mind all that.) On the other hand, equality comparisons,
> after conversion to a suitable type such as "char *" or "void *",
> *are* required to produce "not equal":
>
> if ((char *)&lbuffer[23] == (char *)&buffer[72])
> abort(); /* never happens */
I am not sure if I agree with you completly on this one. The above is
supposed to be true _only_ as long as the indexes are within the limit
assigned to them. I wrote this test snippet for my Borland 5.0
#include <stdio.h>
int main(void)
{
int ibuff[1];
char cbuff[1];
if ( (char*)&ibuff[0] == (char*)&cbuff[0+0x42])
printf("OOps\n");
printf("%p %p\n", (char*) &ibuff[0], (char*)&cbuff[0] );
}
> > printf("\n%s\n",s);
> >
> > int i; /* stored in main()'s stack frame, but when is memory
allocated? */
>
> Declarations after code are a C99 feature. Quite a few compilers
> do not support this; in C89 you could use a new block:
>
> printf("%s\n", s);
> {
> int i;
> ...
> }
>
> C requires only that the program behave "as if" i's lifetime begins
> at its declaration and continues until execution reaches the "}" that
> terminates its scope. In your C99-specific code, that is the final
> close-brace for main(); in the C89 variant, it is the close-brace
> inserted to match the open-brace I added here.
>
> > for(i=0; i<100; i++)
> > {
> > buffer[i] = i;
> > }
> > return 0;
> >}
> >
> >Additional questions:
> >
> >- Since local variables don't have to be declared at the beginning
of a
> >function, during run-time, is space for all local variables to be
used
> >in any function allocated when the function is entered, or only when
> >they are used?
>
> A C compiler can achieve the required behavior by allocating all
> local variables at a function's entry, or by allocating them upon
> reaching their enclosing block or (in C99) their initial definition.
> There are merits and drawbacks to either method; you will find that
> different C compilers choose different approaches.
>
> >- Since main() is always the starting point for programs, do
compilers
> >really put it's local variables in a main()-stackframe or simply
place
> >it in fixed memory locations? How are static locals in main()
> >treated?
>
> In C (but not in C++ -- the languages are really quite different,
> despite some syntactic similarities), you -- the programmer -- are
> allowed to call main() recursively. If you do, it must behave just
> like any other function. This makes it difficult for C compilers
> to weasel out of creating a stack frame in the usual manner on
> typical machines. (The compiler would have to determine that you
> do not in fact call main() recursively; if so, it could rewrite
> all the automatic variables in main() to have static-duration.
> This determination is not all that hard, but such rewriting is also
> not all that profitable -- the program is unlikely to be any faster
> or smaller. So why bother?)
>
> In both C and C++, it is possible -- via the atexit() function for
> instance -- to do dumb things with variables whose lifetime
terminates
> when main() returns. Consider the following broken C code:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> static int *p;
>
> void oops(void) {
> printf("*p = %d\n", *p);
> }
>
> int main(void) {
> int v = 42;
> atexit(oops);
> p = &v;
> return 0;
> }
>
> Here, in the C abstract machine, atexit() registers the function
> oops() to run when the program exits. Then we set p to point to
> v, which is local to main(), and then we return from main(),
> destroying the variable v. Now all atexit()-registered functions
> are run, so oops() runs, and attempts to access *p -- but p points
> to a variable whose lifetime has terminated. The effect is
undefined:
> the program is allowed to crash, or print 42, or print any other
> number, or indeed do anything, such as post lies about your boss
> to USENET. :-)
>
> We can fix the program by changing "int v" to "static int v". This
> changes the storage duration from automatic to static, so that only
> one copy of "v" exists no matter how many times we call main()
> recursively (none, in this case), and "v" exists for the lifetime
> of the program. Since the "lifetime of the program" continues
> *past* the return of main(), while atexit() like oops() run, this
> now makes a difference.
>
> We can also fix the program by removing the call to atexit(), though
> of course this stops the program from printing 42.
>
> [% There is one quite substantial merit to having at least two
> stacks, one for "control" -- return addresses and the like -- and
> one for "data" such as local variables. In particular, a system
> with two stacks offers the opportunity to debug programs that
> overwrite local arrays. Because the data are in the "Dstack",
> pointed to by the DSP or data-stack-pointer register, while the
> control values are in the "Cstack" pointed to by the CSP or
> control-stack-pointer register, and the two stacks are "far apart"
> in memory, writing past your own DSP area clobbers only other DSP
> memory. Breakpoints set via the CSP still cause the program to
> stop where you want, and you can then observe the DSP corruption.
>
> This design also interferes with typical Microsoft-bug-exploits:
> buffer overruns no longer allow you to overwrite the return address.
> The distance between CSP and DSP can be randomized on each run of
> the program, as well.]
Point well taken, but I believe it's not strictly a Microsoft thingy.
AFAIK Linux and for that matter x86 based systems save the return
address within the current stack itself only.
--
Imanpreet Singh Arora
If I am given 6 hours to chop a tree, I would spend the
first 4 to sharpen my axe.
Abraham Lincoln
- Next message: Old Wolf: "Re: initializer element is not constant"
- Previous message: bd: "Re: accessing freed memory without error"
- In reply to: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Next in thread: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Reply: Chris Torek: "Re: Do buffers always start with the lowest memory address being the first element?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|