Re: union {unsigned char u[10]; ...}
- From: Yevgen Muntyan <muntyan.removethis@xxxxxxxx>
- Date: Wed, 14 Mar 2007 01:55:14 GMT
Keith Thompson wrote:
Yevgen Muntyan <muntyan.removethis@xxxxxxxx> writes:Keith Thompson wrote:Yevgen Muntyan <muntyan.removethis@xxxxxxxx> writes:Well, UB here is totally enough for me, regardless of what exactlyKeith Thompson wrote:Another problem is that it's not necessarily accessing the value ofYevgen Muntyan <muntyan.removethis@xxxxxxxx> writes:I am not convinced. Consider this:Ben Pfaff wrote:[snip]Yevgen Muntyan <muntyan.removethis@xxxxxxxx> writes:But character type is not a union.
Why is it legal to doSee C99 section 6.5 "Expressions":
union U {unsigned char u[8]; int a;};
union U u;
u.a = 1;
u.u[0];
An object shall have its stored value accessed only by an
lvalue expression that has one of the following types:73)
[...]
- a character type.
u.a is of type int. u.u[0] is of type char, a character type. The
code above accesses the stored value of the object u.a using an lvalue
expression, u.u[0], which is of character type, which satisfies 6.5.
int a;
int b;
int *p = &b;
*(p - 1);
It accesses value of a using an lvalue of type int. The problem is
of course that *(p-1) is illegal.
a.
implementation will do :)
However, the standard does explicitly allow objects to be adjacent (itIt's legitimate? Pointer arithmetic is allowed only on arrays (not sure
has to do so to make pointer equality work consistently). In the
absence of any knowledge of how a and b are allocated in memory,
evaluating *(p - 1) invokes undefined behavior. If you happen to know
that they're adjacent, then *(p - 1) does access the value of a, and
it's legitimate (though quite silly).
what correct term is, it's not those int a[2]; arrays), isn't it? I
mean, it's UB even if a and b happen to be adjacent (which itself
isn't a standard term, since standard doesn't know what it means for
objects which are not members of some aggregate, in which case we
can talk about sequences of bytes).
For purposes of pointer arithmetic, any object can be treated as if it
were a single-element array. See, for example, C99 6.5.8p4:
For the purposes of these operators, a pointer to an object that
is not an element of an array behaves the same as a pointer to the
first element of an array of length one with the type of the
object as its element type.
As for adjacency, see C99 6.5.9p6:
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an
object and a subobject at its beginning) or function, both are
pointers to one past the last element of the same array object, or
one is a pointer to one past the end of one array object and the
other is a pointer to the start of a different array object that
happens to immediately follow the first array object in the
address space.
with a footnote:
Two objects may be adjacent in memory because they are adjacent
elements of a larger array or adjacent members of a structure with
no padding between them, or because the implementation chose to
place them so, even though they are unrelated. If prior invalid
pointer operations (such as accesses outside array bounds)
produced undefined behavior, subsequent comparisons also produce
undefined behavior.
Without this special-case permission, the standard would have had to
say that, given
int a;
int b;
&a + 1 *may not* be equal to &b (or vice versa), which would require
the implementation to insert at least one byte of padding between
objects that might otherwise be adjacent.
Sorry, I don't understand if it's yes or no. I said the following:
1) int a; int *p = &a + 1; is UB.
2) In "int a; int b;" if we say "a and b are adjacent" then it has
not meaning as far as the standard is concerned. We could talk
about implementation-specific memory layout, about what addresses
actually mean, but it's not standard.
Allowing objects to be adjacent is necessary for equality to be
defined consistently.
Standard doesn't allow nor disallow independent (as in not parts
of some aggregate) objects to be adjacent, it simply doesn't
say nor care about what it means.
It's for the same of implementers' convenience;
no program should take advantage of this permission.
And it's really a very minor and obscure point; you just happened to
hit on it in your example.
Nope, I hit an easy example of UB, I needed UB to demonstrate
that quoted paragraph from 6.5 wasn't enough for that union thing.
As for this example, an implementation could easily make
&a == &b + 1; (is it what's called direction in which stack grows?).
...I'm not quite sure what this has to do with the question about unions,It was an example of situation where types are fine as to 6.5p7
though.
but the expression was illegal nevertheless.
Be careful with the word "illegal". I think what you mean is that it
invokes undefined behavior.
No, strictly speaking I mean non-strictly-conforming code.
Because it's similar to sayingSame thing with that union: whyI'm afraid I don't understand what you're getting at here. u.u[0]
is u.u access allowed, and why is value of u.u is the same as if you
actually set it, using u.u[0] = 8?
accesses the first byte of u.a; why would it not do so?
union U {int a; double b;};
U u;
int a = 1;
u.a = a;
memcpy (someplace, &u.b, 1);
is allowed and copies first byte of a. But we can't use u.b
here, or can we?
We can't use *the value of* u.b because C99 6.7.2.1p14 says:
The value of at most one of the members can be stored in a union
object at any time.
But 6.5p7 gives special permission to access an object by an lvalue
expression of character types.
It doesn't give special permission to access union member other
than that was previously set, at least it's absolutely not obvious
if it does.
As the footnote there says:
The intent of this list is to specify those circumstances in which
an object may or may not be aliased.
Exactly, has nothing to do with this particular thing: whether
you can freely access different union members.
One way to alias an object is to make it a member of a union. (Other
ways involve various pointer tricks.)
Now I'm not sure whether you can actually prove, from the wording of
the standard, that it's permitted to store a value in one member of a
union, then access a different member, as long as the other member has
character or array-of-character type. (By "permitted", I mean not
invoking undefined behavior.) But it's a reasonably common idiom,
It's also common idiom to do this:
union U {void **ptr; int **iptr;};
void func (void **ptr);
....
union U u;
u.iptr = &ip; /* ip is some int pointer */
func (u.ptr);
to avoid gcc warnings about strict aliasing when
you do just func((void*)&ip);. Similar thing is used to pass
character data around (when function takes unsigned char **
to store "any data" at given address). Or, struct hack -
common idiom, nobody knows if it's legal.
and
I'm about 95% convinced that it's *intended* to be allowed. It's
difficult to imagine an implementation that meets the requirements of
the standard but disallows this particular kind of aliasing.
I'd think that it's simple: either you can access union members freely
(i.e. standard permits it), or not. In latter case an implementation
could explode when you do it, similar to famous implementations
which check array bounds (none does, and struct hack works).
I think the question of whether the wording of the standard actually
supports this conclusion is getting into comp.std.c territory.
I believe rationale explains the intent of that 6.5 wording, and
the intent certainly wasn't to allow accessing character array
union members. If it's allowed, then it must be elsewhere.
Best regards,
Yevgen
.
- Follow-Ups:
- Re: union {unsigned char u[10]; ...}
- From: Jack Klein
- Re: union {unsigned char u[10]; ...}
- References:
- union {unsigned char u[10]; ...}
- From: Yevgen Muntyan
- Re: union {unsigned char u[10]; ...}
- From: Ben Pfaff
- Re: union {unsigned char u[10]; ...}
- From: Yevgen Muntyan
- Re: union {unsigned char u[10]; ...}
- From: Keith Thompson
- Re: union {unsigned char u[10]; ...}
- From: Yevgen Muntyan
- Re: union {unsigned char u[10]; ...}
- From: Keith Thompson
- Re: union {unsigned char u[10]; ...}
- From: Yevgen Muntyan
- Re: union {unsigned char u[10]; ...}
- From: Keith Thompson
- union {unsigned char u[10]; ...}
- Prev by Date: Re: union {unsigned char u[10]; ...}
- Next by Date: Re: union {unsigned char u[10]; ...}
- Previous by thread: Re: union {unsigned char u[10]; ...}
- Next by thread: Re: union {unsigned char u[10]; ...}
- Index(es):
Relevant Pages
|
Loading