Re: Why do we need executables in certain formats ?

From: KVP (spamtrap_at_crayne.org)
Date: 02/18/05

  • Next message: wolfgang kern: "Re: not all descriptor's bits are checked in real mode ?"
    Date: Fri, 18 Feb 2005 08:04:57 +0000 (UTC)
    
    

    Scott Moore <spamtrap@crayne.org> wrote:
    > WahJava wrote:
    > > Hi devs,
    > > Can anybody explain me why do we need executables in certain formats ?
    > > Why not plain binary (.com) file can't be used for execution ? How do
    > > these files are loaded in memory ? How jump locations are resolved @
    > > runtime ?
    > > I know these questions are answered at university level ? But I'm too
    > > far from those.
    > There is nothing wrong with plain binaries. The original rationale for
    > complex binary formats was that programs need to be relocated, and
    > perhaps linked with libraries. The need to relocate a program is
    > entirely obsolete. Modern virtual memory processors can locate to
    > any standard address, which on most machines is the next page after
    > the zero page (so that zero address references will cause an error).
    > The need to link with libraries is more current, but this need, commonly
    > referred to as "dynamic linking and loading" has caused huge problems
    > with cross dependencies in Windows systems. Programs can have their
    > ..DLL files changed out from under them, and fail because the program
    > has a hidden problem with the new .DLL. This has caused many software
    > makers to force the global update of all .DLLs required by the program
    > being installed to the current version, which then can break older
    > programs that were installed using the old .DLL files. What .DLL
    > files do is raise the possibility that a program can be run with a
    > series of .DLL combinations that are exponential, and completely
    > beyond anyones ability to test, or plan for.

     This is only true on operating systems where dll versioning is not
     supported. It can be solved by simply embedding the manufacturer name
     and the version string into the dll's name, and linking against these
     versioned names.

    > The main use of dynamic linking is to "save memory", by allowing
    > DLLs to be shared between programs, and between different invocations
    > of the same program. But memory is not only cheap and plentiful,
    > compared to the days when .DLL was designed, but virtual memory
    > makes it largely irrelevant how large the memory for a particular
    > program is, since the working set is organized only around active
    > pages. Virtual memory can also allow different invocations of the
    > same program to share their binaries, by mapping the same code
    > page into multiple processes. Ironically, .DLL techniques work
    > AGAINST that, as I will explain.

     If we have 10 programs, each 10Kb using a single dll 10Mb, then we
     have a memory requirement of 10*10Kb+10MB = 10.1Mb. The same 10 programs
     linked with static library code will be 10Kb+10Mb each, resulting in
     100.10Mb of memory usage. A system might have 16Mb of memory and keep
     all programs and the single dll in memory, or required to swap more than
     100Mb in and out of its 16Mb of physical memory.

     In modern systems (windows for example) all code is based on the dll
     paradigm. Executable code, external libraries and even resources like
     fonts are just dll-s, and the same resource sharing rules apply to them.
     The only way to provide this functionality and _not_ use dlls would be
     to give each dll its own private address space, which will require more
     context space switches and hurt performance.

    > What .DLL *DOES* do is unnecessarily complicate virtual memory loading
    > and sharing. Dynamic linking and loading requires that the image for
    > a program be modified. The program is modified to fit at the given
    > address, and the links to used .DLLs are modified to point to their
    > actual locations in memory. Because there now exists a "customized"
    > version of the program, it is no longer a "virtual" image of its
    > disk store, nor can those working pages be shared with multiple
    > invocations of the same program. Windows gets around these problems
    > by not relocating the image at all, and routing all .DLL references
    > via an "indirect jump" table embedded in the program file. This
    > allows only the pages containing the jump table to have the
    > per process copy aspect. The price of the scheme is that each
    > ..DLL linkage jump/call needs to be an indirect address.

     Modification is only required for the linkage tables, so the most part
     of the code and constant space is shared unmodified amongst processes.
     Even dynamic data space can be shared with copy on write semantics.

     A better way of doing this is the linux way, where you have position
     independent code in the libraries, so they can be mmap()-ed to any
     address. Using the same techique for normal binaries would also work.
     This eliminates the need to patch anything and makes application and
     library loading much easier and faster. All that is needed, is to
     know what address each library starts on.

    > Sadly, Unix implementations, apparently feeling envy of not having
    > the *WORST* feature of Windows, imported Dynamic Linking into that
    > system, instead of imitating features of Windows that were actually
    > useful, so now all modern operating systems perform this hack.
    > Many serious application programmers have elected to get off this
    > train by "hard linking" libraries permanently into their programs,
    > entirely negating the .DLL system, and the need for complex
    > executables.

     And this results in multiple instances of the same dll linked into
     differnt programs, which means more memory required to hold the binaries,
     and less memory to be used by actual data.

    > In the virtual memory versions of the IBM 360 OS, back in 1960s,
    > had "hard" binary images, and so were dramatically simple and
    > efficient implementations. When a program was "loaded", it was
    > simply marked as a running program. Since each page of the binary
    > on disk was always an exact image of the in memory store, the
    > program itself would request only the exact pages of the program
    > that were needed, it would literally "fault" itself into an
    > efficient working set. Because none of the program was allowed to
    > be modified, all invocations of the program automatically shared
    > the same program pages. A process (running program) was literally
    > the working set of its read only binary image pages plus a series
    > of variable pages that again, the program itself requested.

     Windows works just like this, except that some portions of its memory
     area is copy on write, like the data area and the linkage tables. It's
     just a more flexible and automated way of doing the same old trick.

    > In short, there is nothing wrong with a flat, binary image. It is
    > even possible to embed a simple signature in the binary image so
    > that it can be verified that the image is not a non-executable file,
    > or from the wrong CPU (the program can jump over the signature).
    > What the proliferation of executable formats has more to do with
    > is that the kids who graduated computer science courses in the 1970s,
    > and built the "modern" systems we use today, thought they were
    > to smart to go back and read 1960s operating systems books,
    > and proceeded to make all the same mistakes the mainframe designers
    > made in the 1950's, which are all enshrined in these bloated,
    > vastly over complex, buggy and insecure operating systems we have
    > to use today. There is nothing natural or necessary about the
    > ridiculous and overcomplicated executable formats in current use,
    > and you are right to question them.

     Yes, we could have simple and more efficient formats like the coff or elf
     binary formats. However if you take a close look at windows, it's using
     a simplified coff format for every mappable file (exes, dlls, ttfs, and
     many other resource only files). Having a consistent image format optimized
     toward using the virtual memory subsystem more efficiently is imho worth
     the effort. The only thing that could be made simpiler is the header
     structures, where bitmapped fields could be replaced with a more easily
     expandable data format. But the basic idea of region based memory mapping
     optimization is worth the slightly more complex format.

       Viktor


  • Next message: wolfgang kern: "Re: not all descriptor's bits are checked in real mode ?"

    Relevant Pages

    • Re: Dll Hell
      ... but I don't know how intelligent OS/2 is ... were previously loaded by others and are already resident in memory. ... When a DLL is loaded into the system, it takes an 8-letter module name usually based on its filename. ... libraries, ...
      (comp.os.os2.apps)
    • Re: .EXE -> .ASM -> .EXE
      ... all the code in memory when the app runs. ... if you had a real knowledge about how the Dynamic ... OTOH, you also have the problem that when the DLL loads, the *entire* ... Guga confused about the costs of static libraries. ...
      (alt.lang.asm)
    • Re: Static Libraries and Assembly Language
      ... Dynamic linked libraries are also nice in such situations. ... which is far more precious than disk space, a DLL generally consumes ... *far more* memory than a statically linked library. ... For example, an assembler ...
      (alt.lang.asm)
    • Re: linkage
      ... [Scroll down for an English translation and answer...] ... - if the .lib (.dll) contains declarations of the functions? ... new and delete so that all libraries included in the project use the ... you want to allocate memory in a DLL and return ...
      (comp.lang.cpp)
    • Re: Why is Base3.exe using 700+ K in windows task manager ?
      ... DLL function. ... handles opened by any thread of the calling process ... The DLL allocates memory from the virtual address space of the calling ... Who can tell where memory is really located in a virtual adress space? ...
      (alt.lang.asm)