Re: Pointing to high and low bytes of something

From: Dan Pop (Dan.Pop_at_cern.ch)
Date: 11/18/03


Date: 18 Nov 2003 18:13:28 GMT

In <1f24a1b2.0311180547.10785161@posting.google.com> ljlbox@tiscalinet.it (Lorenzo J. Lucchini) writes:

>My code contains this declaration:
>
>: typedef union {
>: word Word;
>: struct {
>: byte Low;
>: byte High;
>: } Bytes;
>: } reg;
>
>The colons are not part of the declaration.
>
>Assume that 'word' is always a 16-bit unsigned integral type, and that
>'byte' is always an 8-bit unsigned integral type ('unsigned short int'
>and 'unsigned char' respectively on my implementation).
>
>My understanding, after browsing through previous threads on this and
>other newsgroup, is that given a variable Var of type reg, accessing
>Var.Word after having assigned values to Var.Bytes.Low and
>Var.Bytes.High or, conversely, accessing Var.Bytes.Low and
>Var.Bytes.High after having assigned a value to Var.Word, results in
>implementation-defined behavior (or possibly undefined behavior).

Accessing Var.Bytes.High and Var.Bytes.Low (after initialising Var.Word)
will always provide implementation-defined results, with no possibility
of undefined behaviour. But not the other way round.

>If it is indeed implementation-defined behavior, my question is: can
>the implementation only take the liberty to choose whether
>Var.Bytes.Low or Var.Bytes.High will contain the LSB of Var.Word, and
>whether Var.Bytes.High or Var.Bytes.Low will contains the MSB, or can
>the implementation take other liberties?
>
>Intuitively, I would say that there is more than this (specifically,
>that the compiler can insert padding after the first member of the
>Bytes struct), but some articles I've read seemed to imply otherwise.

Your intuition is correct: in theory, the compiler *can* do that.
In practice, padding bytes are inserted only when they serve a *good*
purpose. Inserting padding byte(s) between Low and High would be
downright perverse, since, *in the framework of your assumptions*, no
padding bytes are needed at all: you're merely aliasing a two-byte object
by two independent bytes.

>Anyway, it all comes down to: assume that I am willing to sacrifice
>portability by forcing the maintainer to exchange the positions of the
>two members of Bytes depending on the implementation; do I then have a
>guarantee that Var.Bytes.Low will always evaluate to the LSB of
>Var.Word, and that Var.Bytes.High will always evaluate to the MSB of
>Var.Word?

In practice, yes, assuming that your initial assumptions still hold.

>If not, then I would gladly accept suggestions on how to change my
>code.
>Keep in mind that I need to access:
>1) Var.Word (or its equivalent after the change) by address
>2) Var.Bytes.Low (or its equivalent) by address
>3) Var.Bytes.High (or its equivalent) by address
>to the effect that this code can be modified in a straight-forward way
>to work as intended:
>
>: #include <stdlib.h>
>: #include <stdio.h>
>:
>: int main() {
>: reg Var;
>: reg *VarWordP;
>: reg *VarLSBP;
>: reg *VarMSBP;
>: VarWordP=&(Var.Word);
>: VarLSBP=&(Var.Bytes.Low);
>: VarMSBP=&(Var.Bytes.High);
>: *VarWordP=0x1234;
>: printf("%x %x %x\n", *VarWordP, *VarLSBP, *VarMSBP);
>: return 0;
>: }
>
>Assume type reg has been defined as above. I should always get
>1234 34 12
>as the program's output, save any changes that could be needed in the
>printf() format specifiers.

You don't need the union at all for this purpose:

    word foo, *wp = &foo;
    byte *ph, *pl;
    pl = (byte *)wp; /* or the other way round, depending on the */
    ph = pl + 1; /* implementation */
    *wp = 0x1234;
    printf("%x %x %x\n", (unsigned)*wp, (unsigned)*lp, (unsigned)*hp);

Now, even the most perverse compiler cannot affect the behaviour of your
code: you're pointing at the two bytes of foo directly, without using
any structs and unions. The only (unavoidable) assumption (apart from the
ones explicitly stated at the beginning of your post) is about which of
the two bytes of a word is the LSB and which the MSB.

Also note the casts in the printf call: %x expects an unsigned value and
there is no guarantee that any of the three values will get promoted to
this type (signed int is far more probable). So, you must provide the
right type explicitly (again, the code will work without the casts as well
in practice, but you have nothing to gain by not doing the right thing).

Dan

-- 
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de


Relevant Pages

  • Re: Why is C Standard Code Example Invalid?
    ... will require use of either a union tag or a typedef. ... I think the answer you are looking for is in the padding. ... A struct may be padded and the compiler can assume than the padding is set ... Larry Jones is the correct reason for this requirement. ...
    (comp.std.c)
  • Re: Floats to chars and chars to floats
    ... >>No, padding can be put in for any reason, though it must preserve ... > only at the end of the union: each member of the union is ... struct s n1; ... aligned at 4-byte boundaries (its most restrictive member). ...
    (comp.lang.c)
  • Re: Unions and Structure Questions
    ... >>Whereas the members of a struct each occupy a different memory location, ... > Plus the size of any padding between the members and after the last ... I would not expect any padding in a union, unless two or more members have ...
    (alt.comp.lang.learn.c-cpp)
  • Re: Question about size and memory layout of a Union.
    ... The implementation can put an arbitrary number of padding bytes after any ... struct A a; ... union B b; ... Maybe, but not necessarily, whereas sizeof b is definitely right. ...
    (comp.lang.c)
  • Re: Undefined behavior - 2 queries
    ... I have 2 queries about undefined behavior: ... It only attempts to access the float as an integer when this makes ... before the first element of the struct? ... padding before the first element of the struct. ...
    (comp.lang.c)