Re: Why do we need executables in certain formats ?
From: KVP (spamtrap_at_crayne.org)
Date: 02/18/05
- Previous message: KVP: "Re: Manually optimizing for efficient cache usage"
- In reply to: Scott Moore : "Re: Why do we need executables in certain formats ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 18 Feb 2005 08:04:57 +0000 (UTC)
Scott Moore <spamtrap@crayne.org> wrote:
> WahJava wrote:
> > Hi devs,
> > Can anybody explain me why do we need executables in certain formats ?
> > Why not plain binary (.com) file can't be used for execution ? How do
> > these files are loaded in memory ? How jump locations are resolved @
> > runtime ?
> > I know these questions are answered at university level ? But I'm too
> > far from those.
> There is nothing wrong with plain binaries. The original rationale for
> complex binary formats was that programs need to be relocated, and
> perhaps linked with libraries. The need to relocate a program is
> entirely obsolete. Modern virtual memory processors can locate to
> any standard address, which on most machines is the next page after
> the zero page (so that zero address references will cause an error).
> The need to link with libraries is more current, but this need, commonly
> referred to as "dynamic linking and loading" has caused huge problems
> with cross dependencies in Windows systems. Programs can have their
> ..DLL files changed out from under them, and fail because the program
> has a hidden problem with the new .DLL. This has caused many software
> makers to force the global update of all .DLLs required by the program
> being installed to the current version, which then can break older
> programs that were installed using the old .DLL files. What .DLL
> files do is raise the possibility that a program can be run with a
> series of .DLL combinations that are exponential, and completely
> beyond anyones ability to test, or plan for.
This is only true on operating systems where dll versioning is not
supported. It can be solved by simply embedding the manufacturer name
and the version string into the dll's name, and linking against these
versioned names.
> The main use of dynamic linking is to "save memory", by allowing
> DLLs to be shared between programs, and between different invocations
> of the same program. But memory is not only cheap and plentiful,
> compared to the days when .DLL was designed, but virtual memory
> makes it largely irrelevant how large the memory for a particular
> program is, since the working set is organized only around active
> pages. Virtual memory can also allow different invocations of the
> same program to share their binaries, by mapping the same code
> page into multiple processes. Ironically, .DLL techniques work
> AGAINST that, as I will explain.
If we have 10 programs, each 10Kb using a single dll 10Mb, then we
have a memory requirement of 10*10Kb+10MB = 10.1Mb. The same 10 programs
linked with static library code will be 10Kb+10Mb each, resulting in
100.10Mb of memory usage. A system might have 16Mb of memory and keep
all programs and the single dll in memory, or required to swap more than
100Mb in and out of its 16Mb of physical memory.
In modern systems (windows for example) all code is based on the dll
paradigm. Executable code, external libraries and even resources like
fonts are just dll-s, and the same resource sharing rules apply to them.
The only way to provide this functionality and _not_ use dlls would be
to give each dll its own private address space, which will require more
context space switches and hurt performance.
> What .DLL *DOES* do is unnecessarily complicate virtual memory loading
> and sharing. Dynamic linking and loading requires that the image for
> a program be modified. The program is modified to fit at the given
> address, and the links to used .DLLs are modified to point to their
> actual locations in memory. Because there now exists a "customized"
> version of the program, it is no longer a "virtual" image of its
> disk store, nor can those working pages be shared with multiple
> invocations of the same program. Windows gets around these problems
> by not relocating the image at all, and routing all .DLL references
> via an "indirect jump" table embedded in the program file. This
> allows only the pages containing the jump table to have the
> per process copy aspect. The price of the scheme is that each
> ..DLL linkage jump/call needs to be an indirect address.
Modification is only required for the linkage tables, so the most part
of the code and constant space is shared unmodified amongst processes.
Even dynamic data space can be shared with copy on write semantics.
A better way of doing this is the linux way, where you have position
independent code in the libraries, so they can be mmap()-ed to any
address. Using the same techique for normal binaries would also work.
This eliminates the need to patch anything and makes application and
library loading much easier and faster. All that is needed, is to
know what address each library starts on.
> Sadly, Unix implementations, apparently feeling envy of not having
> the *WORST* feature of Windows, imported Dynamic Linking into that
> system, instead of imitating features of Windows that were actually
> useful, so now all modern operating systems perform this hack.
> Many serious application programmers have elected to get off this
> train by "hard linking" libraries permanently into their programs,
> entirely negating the .DLL system, and the need for complex
> executables.
And this results in multiple instances of the same dll linked into
differnt programs, which means more memory required to hold the binaries,
and less memory to be used by actual data.
> In the virtual memory versions of the IBM 360 OS, back in 1960s,
> had "hard" binary images, and so were dramatically simple and
> efficient implementations. When a program was "loaded", it was
> simply marked as a running program. Since each page of the binary
> on disk was always an exact image of the in memory store, the
> program itself would request only the exact pages of the program
> that were needed, it would literally "fault" itself into an
> efficient working set. Because none of the program was allowed to
> be modified, all invocations of the program automatically shared
> the same program pages. A process (running program) was literally
> the working set of its read only binary image pages plus a series
> of variable pages that again, the program itself requested.
Windows works just like this, except that some portions of its memory
area is copy on write, like the data area and the linkage tables. It's
just a more flexible and automated way of doing the same old trick.
> In short, there is nothing wrong with a flat, binary image. It is
> even possible to embed a simple signature in the binary image so
> that it can be verified that the image is not a non-executable file,
> or from the wrong CPU (the program can jump over the signature).
> What the proliferation of executable formats has more to do with
> is that the kids who graduated computer science courses in the 1970s,
> and built the "modern" systems we use today, thought they were
> to smart to go back and read 1960s operating systems books,
> and proceeded to make all the same mistakes the mainframe designers
> made in the 1950's, which are all enshrined in these bloated,
> vastly over complex, buggy and insecure operating systems we have
> to use today. There is nothing natural or necessary about the
> ridiculous and overcomplicated executable formats in current use,
> and you are right to question them.
Yes, we could have simple and more efficient formats like the coff or elf
binary formats. However if you take a close look at windows, it's using
a simplified coff format for every mappable file (exes, dlls, ttfs, and
many other resource only files). Having a consistent image format optimized
toward using the virtual memory subsystem more efficiently is imho worth
the effort. The only thing that could be made simpiler is the header
structures, where bitmapped fields could be replaced with a more easily
expandable data format. But the basic idea of region based memory mapping
optimization is worth the slightly more complex format.
Viktor
- Previous message: KVP: "Re: Manually optimizing for efficient cache usage"
- In reply to: Scott Moore : "Re: Why do we need executables in certain formats ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|