Re: Which assembler can handle the BIG stuff ?



The_Sage wrote:


Basically jmp tables are used whenever you want a fixed address to a variable address for an external call. The keywords here are *external call*. No one in their right mind is going to make jmps to jmp tables to jmp to internal locations -- haha!

See below for an example.


Let me give you an example. If you write a PE and you make a call to a function, you know exactly where your function is relative to the start of your code when you make the call, so there is obviously no reason to make a call to a jmp table that then jmps to the function location, when you could skip the jmp table and go there directly.

Now let's say you have a DLL with a function called SOME_FUNCTION() at offset
100h. Before you can call that function from an external application, you have
to know where offset 0h is in memory, and then calculate the offset from there
before making a call to SOME_FUNCTION(). It can be done but is obviously very
poor programming habit as it requires a call to a call, GETPROCADDRESS(), to get
the address. The one and only exception is when you don't have a library to
calculate the offset during link time and must calculate the call address during
run time.

DLLs can have multiple functions that you could hard code the addresses to but
what happens if you update the DLL and the addresses change? You would have to
update everything that references the DLL instead of just updating the DLL. By
calling a fixed offset jmp table location that jmps to the function, you do not
have to recompile all your other executables or DLLs that reference that
function.

This is a little confused, and I have replied to parts of it in another part of this thread. The function is GetProcAddress, not GETPROCADDRESS; Window's function names are case-sensitive.


One of the difficulties of using dynamic load libraries is the need to call LoadLibrary and GetProcAddress to load the library and get the entry point. One technique is to call LoadLibrary for every library that might be required in the code, and save the results; and then call GetProcAddress with the library handle for every entrypoint required, and save those too. This is normally done in some form of initialisation code, and it can get pretty large fairly quickly if there are a large number of entry points in several libraries. Jump tables and thunks can make the whole process much easier, shorter, and simplify the code at the same time.

First, the libraries (pseudo code);

dll-k32:      dd dll-thunk         ; address of thunk
	      dd 0                 ; handle of library
              db "kernel32.dll\0"  ; name of library
dll-user32:   dd dll-thunk         ; more libraries
              dd 0
              db "user32.dll\0"
              ...                  ; best done with macros...

dll-thunk:    push eax             ; save eax
              add eax, # 8         ; eax points at dll name
              push eax             ; parameter is name
              call LoadLibrary     ; load library, eax is handle
              pop ecx              ; ecx is thunk pointer
              mov 4 [ecx], eax     ; store handle
              mov [ecx], # dll-handle ; reset thunk
              ret                  ; back to caller, eax is handle

dll-handle:   mov eax, 4 [eax]     ; eax is handle
              ret                  ; back to caller

; To get the handle of a library;

              ...
              mov eax, # dll-k32   ; point eax at entry
              call [eax]           ; call indirect address
              ...

The first time through, eax points at dll-k32 and calls the thunk. The thunk sets the library handle, and sets the thunk to dll-handle. Next time through, it calls dll-handle directly and returns the handle. LoadLibrary is only called once for each library.

Then we can do the entry points in a similar fashion, using the dll code above;

; The jump table for each entry
ep-alc:       dd ep-thunk          ; entry point
              dd dll-k32           ; ptr to the dll entry
              db "AllocConsole\0"  ; null terminated name

ep-...        dd                   ; etc, one for each entry point

; The thunk for entry points
ep-thunk:     push eax             ; save eax
              add  eax, # 8        ; point at the ep name
              push eax             ; for GetProcAddress		
              mov  eax, -4 [eax]   ; get dll entry
              call [eax]           ; and get handle of dll
              push eax             ;
              call GetProcAddress  ; get the entry point
              pop  ecx             ; point at entry again
              mov  [ecx], eax      ; save entry point as thunk
              jmp  eax             ; call the routine

; To call the entry point;

              ...
              mov  eax, # ep-alc   ; point at AllocConsole
              call [eax]           ; call it (1)
              ...

First time through, ep-thunk gets called with eax pointing to the ep-alc structure. It gets the library handle and calls GetProcAddress to get the entry point, which it then jumps to. The ret will return to after the call marked (1). Second and subsequent times through, the entry point for AllocConsole will just be called directly. GetProcAddress is only called once.

There's a slight overhead on each call of an extra immediate mov instruction before the call. That's pretty small compared with what most Windows functions do. And, to simplify managing the tables and the call, they are probably best written as macros; for example

  k32-dll: dll "kernel32.dll"          ; lays down dll structure
  ep-alc:  ep  "AllocConsole", k32-dll ; lays down ep structure
  ep-gsh:  ep  "GetStdHandle", k32-dll
  ep-fc:   ep  "FreeConsole", k32-dll
  ...
  u32-dll: dll "user32.dll\0"
  ep-shw:  ep  "ShowWindow", u32-dll
  ...

and perhaps something like i-invoke to match invoke.

I've also not shown the error handling if LoadLibrary or GetProcAddress return 0, and there's one piece of initialisation required; getting the addresses of LoadLibrary and GetProcAddress in the first place. That's fairly trivial though.

--
Regards
Alex McDonald
.



Relevant Pages

  • Re: Optimizing Assembler Code
    ... using 'computed goto' is slower than routing via ... movl 8, %eax ... jmp _puts ...
    (comp.lang.asm.x86)
  • Re: FASM Assembler project
    ... invoke DialogBoxParam, eax, IDD_MAIN, HWND_DESKTOP, MainDlg, 0 ... jmp .finish ... invoke SendMessage,, WM_CAP_DRIVER_DISCONNECT, ...
    (alt.lang.asm)
  • Re: Which assembler can handle the BIG stuff ?
    ... One entry contains an offset and the other contains an offset and a segment. ... It is a list of pointers, not a jmp table. ... The only thing a program could *directly* do with this pointer information is ... will do all the calculations for you with GETPROCADDRESS(). ...
    (alt.lang.asm)
  • [GIT PULL] x86/debug for v2.6.37
    ... .macro movq_cfi reg offset=0 ... CFI_REL_OFFSET eax, 0 ... movl $, %edx ... jmp sysenter_do_call ...
    (Linux-Kernel)
  • Re: GC in Jons raytracing benchmark
    ... > I'd bet that the OCaml compiler is just very good at stack allocating ... ..L167: movl caml_young_ptr, %eax ... ..L175: jmp .L173 ...
    (comp.lang.lisp)