Re: RosAsm Team is Still Making Excuses




wolfgang kern wrote:

> [default ALIGN4 ...]
> | A good argument could be made that the assembler should *not* align
> | anything by default, and just leave that up to programmer. But that
> | would simply be a matter of opinion. However, arbitrarily choosing
> | dword alignment is not good.
>
> DW-align "is" the most frequent used variant !!!

And what that means is that if you *must* pick only a single alignment
to use for everything, dword is a good choice. However, a better
solution is to pick a natural alignment for the object.

>
> | > IA-32 CPUs perform much faster on dw-aligned memory access.
> |
> | Not for byte or word objects. Only for dword objects. And for qword
and
> | larger objects, dword aligned may be better than unaligned, but the
> | best performance is obtained by aligning the object on its native
size
> | (and in the case of certain 128-bit operations, success depends
upon
> | 16-byte alignment).
>
> You should spend some time with reading CPU-manuals.
> (SSE code is rare used anyway and its alignment-demand is know)
> dw-aligned byte/word access is much faster than unaligned:
>
> penalties on misaligned access (averaged):
>
> aligned byte/word 0
> byte off by one 1.5
> byte off by two 1
> byte off by three 1.5
> word off by one 2
> word off by two 1
> word off by three 2.5
> dw off by one 2.5
> dw off by two 1
> dw off by three 3 ;worst case

Well, the problem with spending time with the manuals is well-known.
They don't represent real programs too well. I don't know where the
table above comes from, or what the context is, but I did write the
following code (run on a 500 MHz PIII):

program input;
#include ( "stdlib.hhf" )

static(4)
b0 :byte;
b1 :byte;
b2 :byte;
b3 :byte;


begin input;

mov( 0, ebx );
mov( 1_000_000_000, ecx );
align(16);
repeat

mov( b0, al );
sub( 1, ecx );

until( @z );


end input;


Substituting b1, b2, and b3 for b0 in the code above produced
*identical* results for each run: 4.1 seconds. Based on your table,
those extra runs should have taken a lot more time to execute.

In fact, I *have* read the manuals at various times. And one thing the
manual tells me is that on certain processors, even *misaligned*
accesses don't incur a penalty if the data is within a cache line. That
is, I can access a dword object at an odd address if all four bytes
fall within the same cache line. Yet another reason why RosAsm's
default shouldn't be dword for all variables.


>
> | > So Rene's decision for having the most used data-alignment as
> | > "the default" is absolutely logical and comfortable too.
>
> | No, it is not logical. He should have aligned objects on a boundary
> | that is a multiple of their natural size, not pick an arbitrary
number
> | out of the air. For objects smaller than four bytes, Rene's choice
> | wastes space. For objects larger than four bytes, he may not
achieve
> | what he's trying to achieve (best possible speed).
>
> The natural size of an IA-32 CPU is 32-bits (IOW: dword).
> We both already figured out that our meaning of 'logical' is
different.

The fact that the natural size of the CPU is 32 bits has little to do
with the data types a program uses. An eight-bit variable's access
isn't going to be improved by aligning it on a dword boundary. The
experiment I gave earlier shows this. Run it yourself.


>
>
> | > RosAsm users know of this 'default' and there are that many
> | > opportunities to align any different, so all demands are covered.
>
> | Yet if you look at typical RosAsm source files, you don't see the
> | programmers constantly specifying the alignment when the section
begins
> | with something other than a dword.
> | So by default, they're wasting space or they are losing
performance.
>
> You think any other tool can have a work-around for CPU's behaviour?

What is RosAsm "working around?" Nothing. It's just wasting space on
byte and word variables. Given your staunch defense of this nonsense,
it's easy to see why RosAsm does this - the user base doesn't seem to
know any better. And to think that Rene is constantly suggesting that
*I* learn the basics of assembly language :-)


>
> 1.) Every section is dw-aligned by itself (multiple of 1000h) anyway.

Sorry, we've got a vocabulary mismatch here. The phrase "declaration
section" (which sometimes gets shortened to "section" in this
discussion) is not referring to "segment" or "section" in the COFF
sense. A declaration section is a sequence of declarations in which all
objects *are* guaranteed to be allocated sequentially in memory. In
HLA, you'd write something like this:

static
b0:byte;
b1:byte;
b2:byte;
b3:byte;

In MASM you'd write something like this

.data
b0 byte ?
b1 byte ?
b2 byte ?
b3 byte ?

And in RosAsm, you might write something like this:

[b0: B$ 0 b1: B$ 0 b2 B$ 0 b3 B$0]

"Multiple sections" means multiple instances of the above, e.g.,

[b0:B$ 0]
[b1:B$ 0]
[b2:B$ 0]
[b3:B$ 0]

Regardless of what terminology *you* want to use, this is how *I'm*
using the terms "declaration sections" (and the shortened version
"sections") in this discussion. This has little to do with COFF
segments/sections. So wipe that from your mind.



> 2.) So the default alignment doesn't waste a single byte here:
>
> 400000h
> ALIGN4 ;produces again
> 400000h ;and not 400004 as you seem to believe

I don't know what makes you think I "seem to believe" that. But now
consider the following:

[b0:B$ 0]
[b1:B$ 0]
[b2:B$ 0]
[b3:B$ 0]

Do you not agree that RosAsm injects three extra bytes between each of
these declarations? I count a total of four "actual" bytes and at least
nine "padding" bytes (12 if there is another declaration coming after
these guys). Your insistence that byte variables perform better if they
are dword aligned (as well, of course, as B_U_ASM) would seem to
confirm this.

Cheers,
Randy Hyde

.



Relevant Pages

  • RosAsm injects extra bytes into your data
    ... If this is an example of how much better off beginners are with RosAsm ... declare *all* your variables in the same declaration section. ... The assembler is going to have to inject padding bytes between B and T2 ... to keep T2 properly dword aligned (or more, ...
    (alt.lang.asm)
  • Re: RosAsm Team is Still Making Excuses
    ... And that's what RosAsm is doing. ... By default, it wastes space. ... variables or the alignment is off. ... larger than dword). ...
    (alt.lang.asm)
  • Re: RosAsm Team is Still Making Excuses
    ... dword is a good choice. ... | solution is to pick a natural alignment for the object. ... | RosAsm does this - the user base doesn't seem to know any better. ... A declaration section is a sequence of declarations in which all ...
    (alt.lang.asm)
  • Re: RosAsm Team is Still Making Excuses
    ... arbitrarily choosing | dword alignment is not good. ... a better solution is to pick a natural alignment for the object. ... using the terms "declaration sections" in this discussion. ... I also tried with sub as well as using a register instead of a constant, it seems that when using a constant->mem, there is a speed improvement when the variables are dword aligned, when using a reg->mem there is no difference from alignment, so to be safe, it's probably better to align dw. ...
    (alt.lang.asm)
  • Re: RosAsm Team is Still Making Excuses
    ... mov(0, (type dword total)); ... Apparently not on the PIII on which I ran this test. ... > | RosAsm does this - the user base doesn't seem to know any better. ... It's a crazy idea to waste space between the declarations if there is ...
    (alt.lang.asm)