Re: Can one get away with an under-allocated union?



On Sat, 25 Dec 2010 20:13:46 +0100, Alexander Klauer
<Graf.Zahl@xxxxxxx> wrote:

Hi,

suppose I allocate space for a structure, can I safely interpret
the allocated object as a union, even if the size of the space
allocated is smaller than the size of the union type?

This question appears to have come up before; at least I found
the (very old) threads

"struct pointer casting", c.std.c, 1993/03/22
http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4

"Union and malloc", c.l.c, 1998/08/22
http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02

and the answer appears to lean towards "No". Does the C99
perspective change anything (I have the N1256 draft)? In order
to create some practical ground for discussion, consider the
following C99 program:

If you make some reasonable assumptions, the answer remains
emphatically no.


-----> start <-----

#include<stdio.h>
#include<stdlib.h>

enum Type {
T_SCALAR,
T_VECTOR
};

These constants have type int. Assume sizeof(int) is 4.


struct S {
enum Type type;
};

While it is possible for the compiler to decide that type should be a
char or short, it only makes a difference if the compiler decides to
make type a long or long long. Let's assume it is an int.

While terminal padding is allowed, most compilers will set the
sizeof(struct S) to the sizeof(enum Type) which is 4 for this example.


struct S1 {
enum Type type;
int scalar;
};

Similarly, sizeof(struct S1) usually will be 8.


struct S2 {
enum Type type;
int vector[3];
};

And sizeof(struct S2) will be 12.


union U {
struct S s;
struct S1 s1;
struct S2 s2;
};

void print_u(const union U * u) {

For sake of the example, let u contain the value 0x1000. It points to
an allocated area of 8 bytes.

switch (u->s.type) {

Since all the members of the union begin at the front of the union, s
begins at 0x1000. Since the first member of a struct begins at the
front of the struct, s.type also begins at 0x1000.

case T_SCALAR:

This is the only code that executes based on main below.

printf("%d\n", u->s1.scalar);

s1.scalar will begin at 0x1004.

break;
case T_VECTOR:

If this code were to execute,

printf("(%d,%d,%d)\n",
u->s2.vector[0],

s2.vector[0] would begin at 0x1004.

u->s2.vector[1],

But s2.vector[1] would begin at 0x1008 which is not part of the
allocated memory.

u->s2.vector[2]);

The same is true for s2.vector[2] which would begin at 0x100C.

break;
}
}

int main(void) {
struct S1 * s1 = malloc(sizeof(*s1));
if (s1 == NULL)
exit(EXIT_FAILURE);
*s1 = (struct S1) { .type = T_SCALAR, .scalar = 42 };

print_u((union U *) s1);
}

----> end <-----

There are several issues with this program.

* The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
provided that the resulting pointer is correctly aligned for
the union. One should think this requirement to be fulfilled
because the value of s1 was returned by a successful call to
malloc. However, as Mark Brader has pointed out in
<1993Mar26.103447.25791@xxxxxxxxx>, the wording of
7.20.3p1, "The pointer returned if the allocation succeeds is
suitably aligned so that it may be assigned to a pointer to any
type of object and then used to access such an object or an
array of such objects in the space allocated (until the space
is explicitly deallocated)", may be construed to imply that
malloc may return pointers not suitably aligned for types whose
size exceeds the allocated size. Is this still an accepted
interpretation of the wording of the standard?

Was it ever? malloc has no clue about what type of pointer the result
will be stored in nor the size of any object to be stored in the area.
That is part of the reason why the returned value must be properly
aligned for any object regardless of any mismatch between requested
size and actual size of the object.


* Strict aliasing and the access to u->s: the strict aliasing
rule laid down in 6.5p7, next-to-last item, allows the access
to u->s1 after the cast discussed above. Furthermore,
the "special guarantee" from 6.5.2.3p5 allows the access of
struct S in an object of type union U containing a struct S1.
Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
legal?

* u is under-allocated for its type (assume sizeof(*u) >
sizeof(struct S1)). Does this, in itself, evoke UB? Clearly, an
assignment to u->s2 would be UB, caused by under-allocation.

As noted above, assignment is not necessary to invoke UB. Any
reference to any part of s2 that is not within the allocated area will
invoke UB.

(In the present case, the type of *u is const-qualified, so
this assignment is not possible. However, const-qualification
is not recursive, so with slightly more complicated structure
types, an UB assignment is possible.) But in the absence of
such explicit violations, may the compiler assume, the non-NULL
pointer u points to at least sizeof(*u) bytes and thus may UB
ensue?

The compiler always assumes you are telling it the truth. You defined
u as a pointer to union. The compiler will generate any code
accessing the union as if were true. Thus, the generated code will
cause UB when the size is insufficient, when the pointer is
indeterminate (prior to being assigned a value or after an allocated
area has been freed), when the pointer is NULL, and possibly others I
haven't thought of.


The reason I ask this question is that I have the following
situation (which I think is fairly common, but I may be wrong):
I have a list of pointers to objects of different sizes. When I
retrieve a pointer, I want to know what type of data it points
to, and then operate on that data accordingly. The natural
solution appears to be using a union type. But allocating an
entire union for each object is wasteful.

There is, of course, a simple workaround. Just replace each

struct SomeStruct {
enum Type type;
/* lots of members */
};

with

struct SomeStructReal {
/* lots of members */
};

struct SomeStruct {
enum Type type;
struct SomeStructReal * p;
};

and then allocate space for struct SomeStructReal and the union
in which objects of type struct SomeStruct and similar reside.
But isn't this a little unnatural? In other words: if
under-allocating unions leads to undefined behaviour, are there
any actual implementations exhibiting unintended behaviour in
such a case? If not, the standard should IMHO be fixed to make
such use of unions well-defined. Or is there any compelling
reason the standard makes under-allocating unions undefined (if
it does)?

Finally, if I am right in surmising that my situation is common,
maybe this question should go into the FAQ?

Alexander

--
Remove del for email
.



Relevant Pages

  • Re: Can one get away with an under-allocated union?
    ... suppose I allocate space for a structure, ... the space allocated is smaller than the size of the union ... "The pointer returned if the allocation succeeds is ... assignment is not necessary to invoke UB. ...
    (comp.lang.c)
  • Re: structs and dynamic memory allocation
    ... >> there will be in every struct in the end, so I have to allocate memory ... >> them and can't use an array of fixed size, ... > a pointer as if it was an array in THIS instance. ... > still allocate the correct amount of space. ...
    (comp.lang.c)
  • gcc, aliasing rules and unions
    ... struct B {int x, y;}; ... A struct pointer can be converted to another ... Also I don't know what the "effective type" of a union member is. ...
    (comp.lang.c)
  • Re: Extending unions and ABI?
    ... requirements of the union. ... a void pointer to the appropriate struct. ... Along the lines of the suggestion of "Columbus sailed the ocean China Blue," you could certainly pass a pointer to a common, first member. ...
    (comp.lang.c)
  • Re: Union and pointer casts?
    ... typedef struct foo_t { ... void myfunction ... Is it portable to replace the separate variables and explicit casts with a union? ... pointer to a foo_t or bar_t, and converted that to a 'void *'. ...
    (comp.lang.c)