Re: Benchmarking under 16 bits.



randyhyde@xxxxxxxxxxxxx wrote:
> Herbert Kleebauer wrote:

> Even if it were the best way to learn assembly, it's not the best way
> to benchmark code. The execution environment of 16-bit segments is
> different than 32-bit segments and I would be very careful about
> measuring speed under one system and trying to apply those numbers to
> the other.

Why should I "try to apply those numbers to the other"? All I
said is, that when I execute two inc.b instruction in a loop
10^9 times, then the execution time for "inc.b b0, inc.b b1"
is much bigger than for "inc.b b0, inc.b b4". The loop overhead
is different in 16 and 32 bit code because of the 32 bit opcode
prefix necessary in 16 bit code and because of the 32 bit address,
the inc.b instruction in 32 bit code is longer, which could affect
the execution speed. But this doesn't change anything on the
statement: the execution time for "inc.b b0, inc.b b1"
is much bigger than for "inc.b b0, inc.b b4".


> > It uses self modifying code to make sure, that the code is exactly
> > on the same position when executed with "inc b0 / inc b1" and
> > "inc b0 / inc b4" so the result is not affected by code alignment.
> > But that has nothing to do with 16 bit code, I do the same in the
> > 32 bit code below.
>
> Or, you could just run the program twice, with the opcodes changed.

Maybe, but you can't be sure that the program is loaded at exactly the
same memory location which could affect execution time, so I prefer
self modifying code.

> > It was executed in a DOS box in Win98 (but the obsolete program
> > will also run in XP).
>
> Using VM86? The environment is *different*. While I won't claim that
> this *will* produce different results, I certainly wouldn't claim that
> executing 16-bit code (that is, code running in a 16-bit segment model
> under VM) is going to produce identical results to native 32-bit code.

Nobody said that it produces "identical" results. But when in
a V86 program "inc.b b0, inc.b b1" is much slower than
"inc.b b0, inc.b b4", than this also is true for 32 bit code.


> > As 32 bit program I get about 8.8 s for "inc b0 / inc b1" and
> > about 5,5 s for "inc b0 / inc b4" (but the variance between
> > different runs is bigger than with the DOS version).
>
> Is that in a 16-bit segment, or a 32-bit segment?

If it would use a 16 bit segment, then it would be a 16 bit
program (using 32 bit operand size and 32 bit addressing modes).
It is a normal 32 bit Windows PE console program.


> > The 32 bit source is nearly the same as the 16 bit source:
>
> But in a 16-bit segment, we know that the extra prefix bytes hurt you.
> If you simply claim "These are the results I get on processor XYZ when
> running in a 16-bit segment under virtual-86 mode" I wouldn't even
> *start* to question your numbers. I seriously doubt any modern
> processor attempts to optimize execution speed in virtual-86, 16-bit

You said: If you can, you might try the same experiment under
Windows in 32-bit protected mode on the same system.

I posted the source code and the results for such a program so
I don't understand your "But in a 16-bit segment, we know that
the extra prefix bytes hurt you".

As I already said, there maybe is a little difference in the
execution speed of a "inc.b [memory address]" in 16/32 bit code
because of the different instruction size in 16/32 bit code. But
this doesn't affect the statement: "inc.b b0, inc.b b1" is much
slower than "inc.b b0, inc.b b4" (as you also can see in the posted
result for the 32 bit Windows program).

As I also already said, this AMD processor is the only one where
I got such an effect. But nevertheless this means, that it can
be useful to align byte variables at dword addresses (wasting
three bytes of memory), not because it is faster to access
dword aligned byte variables, but because it is faster to access
two bytes which are not in the same dword.
.



Relevant Pages

  • Re: System Password for 1997 Acromatic 2100, need more memory
    ... continuous load is automatically invoked if the control ... the control permits editing of the current program segment. ... At any point during program execution, ...
    (alt.machines.cnc)
  • Re: Obtaining used pages on process exit?
    ... As I understand, FreeBSD ... registering its handler for page faults, ... number of pages in the virtual address space of the text segment, ... actually used during its execution. ...
    (freebsd-hackers)
  • Obtaining used pages on process exit?
    ... I am currently writing my first kernel module to extract data from ... the amount of pages it brought in on process exit. ... number of pages in the virtual address space of the text segment, ... actually used during its execution. ...
    (freebsd-hackers)
  • Benchmarking under 16 bits.
    ... > Yes, it is an obsolete program for a dead operating system, but this ... The execution environment of 16-bit segments is ... Is that in a 16-bit segment, ... running in a 16-bit segment under virtual-86 mode" I wouldn't even ...
    (alt.lang.asm)
  • Re: Using an exported symbol in the bss section
    ... You probably meant to move the pointer in increments of 4, which would result in each dword containing the value 1. ... Aha, so data is here declared not as an array of dwords, but as a single pointer. ... problem is that data is in the bss segment and it's trying to access ... a different segment with data's address. ...
    (comp.lang.asm.x86)