Re: strict aliasing rules in ISO C, someone understands them ?
- From: Jack Klein <jackklein@xxxxxxxxxxx>
- Date: Thu, 13 Oct 2005 23:31:25 -0500
On 13 Oct 2005 07:39:48 -0700, nicolas.riesch@xxxxxxxxxxxx wrote in
comp.lang.c:
>
> I try to understand strict aliasing rules that are in the C Standard.
> As gcc applies these rules by default, I just want to be sure to
> understand fully this issue.
>
> For questions (1), (2) and (3), I think that the answers are all "yes",
> but I would be glad to have strong confirmation.
>
> About questions (4), (5) and (6), I really don't know. Please help ! !
> !
>
> --------
>
> The Standard says (
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
> ):
>
> An object shall have its stored value accessed only by an lvalue
> expression that has one of
> the following types:
> - a type compatible with the effective type of the object,
> - a qualified version of a type compatible with the effective type
> of the object,
> - a type that is the signed or unsigned type corresponding to the
> effective type of the object,
> - a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object,
> - an aggregate or union type that includes one of the aforementioned
> types among its members
> (including, recursively, a member of a subaggregate or contained
> union), or
> - a character type.
>
>
> ***** Question (1) *****
>
> Let's have two struct having different tag names, like:
>
> struct s1 {int i;};
> struct s2 {int i;};
>
> struct s1 *p1;
> struct s2 *p2;
>
> The compiler is free to assume that p1 and p2 point to different memory
> locations and don't alias.
> Two struct having different names are considered to be different types.
>
> In the standard, we read the wording "effective type of the object"
> many times.
>
> This "effective type of the object" may be an "int", "double", etc, but
> may also be a "struct" type, right ???
>
> And I suppose it may also be an "array" type or an "union" type as
> well, is it correct ???
Yes.
> ***** Question (2) *****
>
> In the little program that follows, the line "printf("%d\n", *x);"
> normally returns 123,
> but an optimizing compiler can return garbage instead of 123.
No, an optimizing compiler must still output "123" for this line.
> Is my reasoning correct ???
>
> On the other side, the line "printf("%d\n", p1->i);" always returns 999
> as expected, right ???
>
> ----
>
> #include <stdio.h>
> #include <stdlib.h>
>
> struct s1 { int i; double f; };
>
>
> int main(void)
> {
> struct s1* p1;
> int* x;
>
> p1 = malloc(sizeof(*p1));
> p1->i = 123; // object of type 'struct s1' contains 123
>
> x = &(p1->i);
>
> printf("%d\n", *x); // I try to access a value stored in an
> object of type 'struct s1'
> // through *x which is of type 'int'.
> // I think this is not allowed by the
> standard !
The effective type of *p1 is 'struct s1'. The effective type of s1.i
is 'int'. 'x' is a pointer to int, and you have initialized it with a
pointer to an int. This is perfectly legal.
Since the int contains the value 123, and 'x' quite properly points to
that int, *x must retrieve the int value 123. It can't do anything
else.
> *x = 999; // I store 999 in *x, which is of type 'int'
>
> printf("%d\n", p1->i); // I access a value stored in *x which is of
> type 'int'
> // by *p1 ( as p1->i is a shortcut for
> (*p1).i )
> // which is of type 'struct s1',
> // but contains a member of type 'int'.
> // I think this is allowed by the standard.
>
>
> return 0;
> }
>
>
> ***** Question (3) *****
>
> The Standard forbids ( if I am not mistaken ) pointer of type "struct A
> *" to access data written by a pointer of type "struct B *", as the are
> different types.
>
> This means that the common usage of faking inheritance in C like in
> this code sniplet is now utterly wrong, is it correct ???
>
>
> --- myfile.c ---
>
> #include <stdio.h>
> #include <stdlib.h>
>
> typedef enum { RED, BLUE, GREEN } Color;
>
> struct Point { int x;
> int y;
> };
>
> struct Color_Point { int x;
> int y;
> Color color;
> };
>
> struct Color_Point2{ struct Point point;
> Color color;
> };
>
> int main(int argc, char* argv[])
> {
>
> struct Point* p;
>
> struct Color_Point* my_color_point = malloc(sizeof(struct
> Color_Point));
> my_color_point->x = 10;
> my_color_point->y = 20;
> my_color_point->color = GREEN;
>
> p = (struct Point*)my_color_point;
>
> printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
This is undefined behavior, pure and simple. It works on many
implementations, but is not guaranteed at all.
[snip]
> Is the line "p = (struct Point*)my_color_point" also a case of what is
> called "type-punning" ???
Type punning is not a term defined by the standard, but I would say
that the act of assigning the pointer via a cast is not type punning.
Accessing a member of the foreign structure type through the pointer
is.
> ***** Question (4) *****
>
> In the Standard, chapter 6.5.2.3, it is written:
>
> One special guarantee is made in order to simplify the use of unions:
> if a union contains
> several structures that share a common initial sequence (see below),
> and if the union
> object currently contains one of these structures, it is permitted to
> inspect the common
> initial part of any of them anywhere that a declaration of the complete
> type of the union is
> visible. Two structures share a common initial sequence if
> corresponding members have
> compatible types (and, for bit-fields, the same widths) for a sequence
> of one or more
> initial members.
>
> I find this statement completely obscure.
>
> Let's have:
>
> struct s1 {int i;};
> struct s2 {int i;};
>
> struct s1 *p1;
> struct s2 *p2;
>
> A compiler is free to assume that *p1 and *p2 don't alias.
>
> If we just put a union declaration like this before this code, then it
> acts like a flag to the compiler, indicating that pointers to "struct
> s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
> to the same location.
>
> union p1_p2_alias_flag { struct s1 st1;
> struct s2 st2;
> };
>
> There is no need to use "union p1_p2_alias_flag" for accessing data,
> and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
> anywhere else.
> I mean, it is possible to access data using directly p1 and p2.
It seems unlikely that a compiler could find a way to prevent it from
working in general, even if the implementer tried, but such behavior
would not render the compiler non-conforming.
On the other hand, since your structure only contains a single member,
and the first member always begins at the same address as the
structure itself, this particular usage can't fail.
Still, the behavior is undefined. Which means the language standard
places no requirements on it at all.
>
> Do you agree, everybody ???
>
>
> ***** Question (5) *****
>
> This question is really hard.
>
> Let's have this code sniplet:
>
> ---------
> #include <stdio.h>
>
> int main (void)
> {
>
> struct s1 {int i;
> };
>
> struct s1 s = {77};
>
> unsigned char* x = (unsigned char*)&s;
> printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
> // Standard says data stored in "struct s1" type can be read by pointer
> to "char"
>
> x[0] = 100; // here, I write data in "char" objects !!!
> x[1] = 101;
> x[2] = 102;
> x[3] = 103;
The standard does not say that you can do this. You are assuming that
sizeof(int) is at least 4, and there are implementations where that is
not true. Accessing, let alone writing to, x[1], x[2], or x[3] might
be outside the bounds of the int and the struct, producing undefined
behavior.
> printf("%d\n", s.i); // but data stored in "char" objects cannot be
> read by pointer to "struct s1" ???
>
> return 0;
> }
No, the point is that accessing s.i, an int, after storing data into
that memory using a different object type, is undefined. You might
have created a bit pattern that does not represent a valid value for
the int, called a trap representation.
> -----------
>
> For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
> (int)x[3]);", I can rewrite the Standard clause like this:
>
> An object [ here, s of type "struct s1" ] shall have its stored value
> accessed only by an lvalue expression that has one of
> the following types:
> [ blah blah blah ]
> - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
> it is our case, so everything is OK so far !
I have worked on a platform where sizeof(int) is 1, and several where
sizeof(int) is 2. I have never worked on a platform where sizeof(int)
is 3, but C allows it. On any of these platforms you would be
invoking undefined behavior.
> But what about the line "printf("%d\n", s.i);" ??????
Even assuming that sizeof(int) >= 4 on your implementation, you have
to understand that all types, other than unsigned char, can have trap
representations, that is bit patterns that do not represent a valid
value for the type. By writing arbitrary bit patterns into an int,
you may have created an invalid bit pattern in that int. When you
access that invalid bit pattern as an int, the behavior is undefined.
> I read the Standard again and again, but I cannot express how is can
> work.
> If I rewrite the Standard clause, it gives:
>
> An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
> stored value accessed only by an lvalue expression that has one of
> the following types:
> - a type compatible with the effective type of the object, [ this is
> not our case ]
> - a qualified version of a type compatible with the effective type
> of the object, [ still not our case ]
> - a type that is the signed or unsigned type corresponding to the
> effective type of the object, [ still not our case ]
> - a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object, [ still not our
> case ]
> - an aggregate or union type that includes one of the aforementioned
> types among its members [ we read through "s" which is of type "struct
> s1", but it does not contain a member of type "char" ]
> (including, recursively, a member of a subaggregate or contained
> union), or
> - a character type. [ definitely not our case ]
>
> We see that none of these conditions applies in our case.
The standard provides a specific list of what is allowed. Lists like
this are always exhaustive. That means anything on the list is
specifically undefined.
> Where is the flaw in my reasoning ???
There is no flaw in your reasoning, the code produces undefined
behavior.
> Does the last "printf" line of this code sniplet work or not ??? and
> why ???
There is no question of "work". Whatever it does is just as right or
wrong as anything else that might happen as far as the language is
concerned. That's what undefined behavior means. The C standard does
not know or care what happens.
> ***** Question (6) *****
>
> I often see this code used with socket programming:
>
> struct sockaddr_in my_addr;
> ...
> bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
>
> The function bind(...) needs a pointer to "struct sockaddr", but
> my_addr is a "struct sockaddr_in".
> So, in my opinion, the function bind is not guaranteed to access safely
> the content of object my_addr.
>
> Someone knows why this code is not broken ( or if it is ) ???
That depends on the definition of 'struct sockaddr_in'. If its first
member is a 'struct sockaddr', the code is legal and well defined
because a pointer to a structure can always be converted to a pointer
to its first member. If not, then the code produces undefined
behavior if the called function actually uses the pointer to access
members of a 'struct sockaddr'.
You use terms like "broken" and "work", which do not really apply as
far as undefined behavior in C is concerned. They are subjective
terms at best. Code is "broken" if it does not do what you want, you
consider it to "work" if it does. If it produces undefined behavior,
it may "work" on one compiler but be "broken" on another, and both
compilers can be standard conforming.
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
.
- Follow-Ups:
- Re: strict aliasing rules in ISO C, someone understands them ?
- From: Tim Rentsch
- Re: strict aliasing rules in ISO C, someone understands them ?
- References:
- strict aliasing rules in ISO C, someone understands them ?
- From: nicolas . riesch
- strict aliasing rules in ISO C, someone understands them ?
- Prev by Date: Re: fopen problem
- Next by Date: Re: fopen problem
- Previous by thread: Re: strict aliasing rules in ISO C, someone understands them ?
- Next by thread: Re: strict aliasing rules in ISO C, someone understands them ?
- Index(es):
Relevant Pages
|