Re: run-time vs compile-time

From: newbiecpp (newbiecpp_at_yahoo.com)
Date: 09/09/04

  • Next message: cbot: "Re: C++ n00b"
    Date: Thu, 09 Sep 2004 13:27:53 GMT
    
    

    "Jonathan Mcdougall" <jonathanmcdougall@DELyahoo.ca> wrote in message
    news:URK%c.77536$fU6.1015196@wagner.videotron.net...
    > > I have hard time to understand run-time environment.
    >
    > It is possible I messed up some things in my explanations, I am no
    > compiler implementor nor an operating system developper. Wait for some
    > corrections to appear before believing all that.
    >
    >
    > Let's review the common steps :
    >
    > 1) Writing source code
    > You specify, in a given language, the commands to be sent to the
    > computer. In C++, that means creating classes and functions and
    > using them. An object O, for example, is only a name, something
    > for the programmer to make his life easier.
    >
    > Always think of a programming language in term of assembly
    > language : the machine does not understand classes, objects,
    > inheritance or whatever. All this high-level code is translated
    > into machine code, so all your variable names, functions, cool
    > class hierarchies will be replaced by memory locations, additions
    > and substractions. In fact, high level languages like C++ could
    > be viewed only as "sytactic sugar", as far as everything you do
    > could be done in assembly language (that's what your code will
    > become ultimately anyways).
    >
    > 2) Compiling
    > Nothing really interesting here, syntax checking and traduction
    > into intermediary code. The only thing here is that all names
    > are checked, so for example
    >
    > // main.cpp
    > int main()
    > {
    > a = 2;
    > }
    >
    > will fail to compile since 'a' does not exist anywhere. The
    > compiler makes sure every name is _potentially_ defined, so the
    > linker will only have to try to find them. So when you write
    >
    > // main.cpp
    > extern int a;
    >
    > int main()
    > {
    > a = 2;
    > }
    >
    > the compiler tells the linker : "'a' refers to some int defined
    > somewhere else, you'll have to find it. good luck".
    >
    > 3) Linking
    > That's getting interesting. The linker makes sure everything is
    > in place : each use of a name is resolved to its definition. If
    > that definition does not exist or if it exists more than once,
    > an error is generated (in most cases). The linker uses the
    > information generated by the compiler to associate names. By
    > using the example above, it searches other files (actually
    > "translation units") for a definition of 'a'. If it finds it,
    > for example with
    >
    > // another file.cpp
    > int a=0;
    >
    > , it associates 'a' in main.cpp with the 'a' in file.cpp. If 'a'
    > is not found anywhere, it stops. If two or more 'a's are found,
    > it stops since there is ambiguity. The the basis of the ODR
    > (One Definition Rule).
    >
    > The linker then translates all data access by an address, but
    > don't get it wrong : that address is not the real address in
    > memory. Actually, you could see that address as an offset in the
    > real memory. When the operating system runs the program, it
    > loads it somewhere and reserves some space in memory for that
    > program. Each variable/address/offset is resolved by the OS to
    > that memory space.
    >
    > How? An easy answer could be "automagically". The real long
    > answer perhaps could be provided by someone in a newgroup
    > supporting your operating system. The thing is, that kind of
    > detail is left to the implementation : the C++ standard only
    > mandates _behavior_, not implementation. So as long as it
    > behaves as specified, an implementation can do anything it wants.
    >
    > You must understand that the only time memory is allocated, and
    > therefore real addresses are defined, is when the program is
    > executed, not during compilation or linking.
    >
    > Please, note that this is a bit of oversimplification.
    >
    >
    > > Let assume that I have
    > > a program that has a simple variable alpha. When this variable is
    > > statically allocated, the compiler can use the absolute address of alpha
    to
    > > access to it.
    >
    > What's important to understand is that nothing is allocated when you
    > compile/link a program. The program merely becomes some machine code.
    > The operating system is then in charge of running it. Yes, the linker
    > assigns some addresses to your variables, but these do not refer to a
    > specific place in memory. As I said, you could consider these
    > "addresses" to be offsets in memory, the starting address being defined
    > by the operating system.
    >
    > For what you said specifically, know that static data is handled
    > differently by most implementations. Actually, three types of data are
    > typically recognized : the stack, the heap and the static data.
    >
    > After re-reading, what do you mean by "statically allocated"? On the
    > stack such as
    >
    > int main()
    > {
    > int i = 0;
    > }
    >
    > or really statically allocated such as
    >
    > int main()
    > {
    > static int i = 0;
    > }
    >
    > Be sure to make the difference between the two.
    >
    > > What confuses me is that when the variable is dynamically
    > > allocated, how does the compiler implement it?
    >
    > That's something else, though not entirely different. What I described
    > about the linker only applies to the stack. For the heap, it works a
    > bit differently.
    >
    > The operating system manages a pool of memory typically called "heap" or
    > "free store". That memory is different from the stack because, first of
    > all, of its longevity. For example, by using the stack :
    >
    > void f()
    > {
    > int i=0; // i is on stack
    > }
    >
    > you have no way of extending the life of 'i'. Once the function
    > terminates, 'i' is destroyed. This is mandated by the C++ standard and
    > it is the usual practice in all languages, including assembly, so it's
    > no big deal.
    >
    > The heap however is a pool of memory. You can reserve and release
    > memory when you want. Typically, in pratice, the operating system has
    > some functions for allocating and deleting memory at low-level,
    > typically in big chunks (several kilobytes). These functions are then
    > used by the malloc() system, which usually maintains another pool of
    > memory itself. Finally, operator new is usually implemented in terms of
    > malloc().
    >
    > The low level OS functions return the address of the allocated memory on
    > the heap. That address can be absolute (which is pretty rare today) or
    > relative, allowing in particular some sort of protection. Relative
    > addressing (also called virtual addressing) behaves like the relative
    > stack address I described earlier.
    >
    > So actually, what the OS returns is only an address, so neither the
    > compiler or the linker has nothing to do with that. That address is
    > determined by the OS at run-time, depending on the loaded programs and
    > the content of the heap. These functions can fail if the heap is full.
    >
    > > We know the address of the
    > > variable until run-time.
    >
    > That should read : "We don't know the real address of the variable until
    > run-time", since the program is not running. For memory to be
    > allocated, the program must run! Compiling it only translates it into
    > machine code and assigns "dummy" addresses which have to be resolved by
    > the operating system.
    >
    > > During the compilation, how can we access to the
    > > alpha variable since we don't know its address yet?
    >
    > As I said earlier, the linker uses some kind of relative address which
    > is resolved by the operating system. If you are asking how the compiler
    > /linker do with
    >
    > void f(int &i)
    > {
    > i = 2;
    > }
    >
    > to know what 'i' refers to, well it is only a matter of keeping a list
    > of the variables in a given scope with their names. When you do
    >
    > int main()
    > {
    > int a = 10;
    > f(a);
    > }
    >
    > The compiler enters 'a' in its list for main() and associates 'i' in f()
    > with it. That's a simple assocation map. Once every name has been
    > resolved, the linker only has to take that list of variable and give
    > them addresses. The operating system is then in charge, when running
    > the program, of allocating memory and resolving the addresses made by
    > the linker to the real addresses in memory.
    >
    > It is important for you to understand that all the things I just
    > explained are not specified by the C++ standard, and therefore you
    > cannot rely on it, altough it is commonly implemented that way.
    > Remember : C++ describes behavior, not implementation.
    >
    >
    > Jonathan

    Thank you very much. I really appreciate your time and help.

    My confusion is from here: C++ says that static allocation, such as

         static int i;

    is binding at compile-time, while dynamic allocation, such as

         int* pi = new int;

    is binding at run-time. I understand now that for i, compiler can put an
    offset related to some location (like stack base) somewhere. But I still
    have some confusion about dynamic binding. To me, the compiler may put an
    offset from heap to pi. But why we call it run-time binding? To me, they
    all decide at compile-time, that is, compiler record an offset from stack or
    heap. During run-time, the OS will decide the addresses of stack and heap
    so that we can have real addresses of i and pi. But why we call one as
    compile-time binding and the other as run-time binding? Or my understanding
    was wrong?

    I appreciate your insight and your time.


  • Next message: cbot: "Re: C++ n00b"

    Relevant Pages

    • Re: run-time vs compile-time
      ... compiler implementor nor an operating system developper. ... The operating system is then in charge of running it. ... specific place in memory. ... of allocating memory and resolving the addresses made by ...
      (alt.comp.lang.learn.c-cpp)
    • Re: run-time vs compile-time
      ... compiler implementor nor an operating system developper. ... The operating system is then in charge of running it. ... specific place in memory. ... of allocating memory and resolving the addresses made by ...
      (comp.lang.cpp)
    • Re: run-time vs compile-time
      ... > compiler implementor nor an operating system developper. ... the heap and the static data. ... of allocating memory and resolving the addresses made by ...
      (comp.lang.cpp)
    • CFP: Special Issue of ACM SIGOPS OSR
      ... The Interaction among the OS, the Compiler, and ... Multicore Processors ... Special Issue of ACM Operating System Review ... The interaction among operating systems, compilers, and multicore ...
      (comp.arch)
    • CFP: SIGOPS OSR special issue
      ... The Interaction among the OS, the Compiler, and ... Multicore Processors ... Special Issue of ACM Operating System Review ... The interaction among operating systems, compilers, and multicore ...
      (comp.arch)