Re: Size of an arraylist in bytes

On Sat, 4 Feb 2012 21:25:00 +0000 (UTC)
jgk@xxxxxxxxx (Joe keane) wrote:

How about this:

int g(int x)
I'm looking for 'reductio ad absurdum'.

i.e., chapter and verse for 'this can't possibly not work'

It will be undefined behaviour, the great catch-all of C.

I think one shall not try to understand the C standard, at least not why is
it the way it is. Maybe they want to cover every possible (existing or not)
architecture or make C as safe as a managed language, I don't know. If
that's the case, they failed, because it is very easy to shoot yourself in
the foot even when your code is completely standard conforming and it is a
major pain in the neck to write code for anything which does not have a
unified address space, but at least they managed to build robust, insidious
traps into the standard even when you work with a stock-standard, unified,
flat address space machine. They're there to show you that if you thought you
knew C, you should think again. They allow the compiler to generate the
wierdest things, which you'd never thought a sane compiler would ever do and
if you want to report it as a bug, the compiler people will just smirk and
wave the standard.

The 'undefined behaviour' is especially nasty and the standard is full of
them. The standard specifies that undefined behaviour means that when the
compiler detects such a condition, it can do whatever it wants and it does
not have to warn you. Consider this:

print_vector_table_element( void **p )
void *e;

e = *p;
if ( p == NULL )
printf( "The reset vector is %p\n", e );
printf( "Vector at %p is %%p\n", p, e );

Compiling the above with gcc will result in code which does not have the 'if'
and the true branch. Only the 'else' branch remains, unconditionally. The
compiler is right. The e = *p dereferences the pointer. If p was not NULL,
then the else branch is executed. If p was NULL, then the dereferencing is
undefined behaviour and the compiler can do whatever it pleases, including
simpy removing chunks of your code, without as much as issuing a warning (and
indeed, gcc remains silent on the above).

Then there's my personal grudge, the volatile. If you read the standard, it
turns out, that this

volatile int a;
int *p;

while ( a = *p++ ) { ... }

will actually compile into this:

int tmp;

for ( ;; ) {
tmp = *p++;
a = tmp;
tmp = a;
if ( ! tmp ) break;

because the standard, prior C11, said that the value of the assignment
expression is the value of its lhs after the assignment. Thus, gcc reads back
the volatile to get the value after the assignment, which with write-only HW
registers is kind of a problem; chances are that you wanted to stop after
writing 0 to it, not when its non-existent value becomes 0.

But fair enough, that's what the standard says, stupid programmer to blame.

However, there's a snag, and actually gcc, possibly inadvertedly, breaks the
standard and generates sane code for this supercomplex code fragment:

volatile a;

a = 0;

Let's see what C99 says about that:

- as per a = 0; is an expression statement
- as per it is evaluated as a void expression
- as per a void expression is evaluated for its side effects
- as per accessing a volatile object is a side effect
- as per the a = 0 is an assignment expression
- as per its value is the value of the lhs after the assignment

Therefore, a conforming implementation, after writing the 0 to 'a' should read
'a' back and then discard the value. It's a major strike of luck that gcc is
not doing that and one can still write device drivers for memory mapped HW.

Back in around '98 I had an issue with the above behaviour, read the prelim
standard and actually contacted a guy on the committee, suggesting that the
standard to be changed so that the value of an assignment is of type of the
lhs with the value *written* to the lhs. He said that they intentionally left
some ambiguity in the standard (i.e. they did not specify that you do have to
read the value back) to give some flexibility to the compiler people. Alas,
the wording still suggested that you should read the value back.

After a decade with all that uncertainty, finally, C11 got rid of the
ambiguity: the compiler has, unambiguously, an explicit right to generate a
read-back or not, whenever it feels like. Thus, in

while ( a = *p++ ) { ... }
while ( a = *p++ ) { ... }

the compiler may compile the first loop with read-back and the second
without. In fact, it can compile each iteration of each loop with or without a
read-back of a. If it wants, it can read back 'a' depending on the momentary
state of the carry flag or the phase of the moon or anything. The standard
gives a mandate to the compiler to whatever it wants.

Just to make it sure, C11 spells it out even more clearly that a void
expression should have all the side effects of the same expression with its
value being used, that is, an "a=0;" must have its value obtained then
discarded, if such action has a side effect. So that you can be absolutely,
positively sure that you'd not the slightest idea whether there will be a
readback after the write or not.

Gotta luv standards.

Zoltán Kócsi
Bendor Research Pty. Ltd.