Re: traumatized by pointer casting

From: Jack Klein (jackklein_at_spamcop.net)
Date: 07/10/04


Date: Sat, 10 Jul 2004 14:49:25 -0500

On Sat, 10 Jul 2004 00:06:30 -0700, tweak <xbwaichunasx@cox.net> wrote
in comp.lang.c:

> Jack Klein wrote:
> > On Fri, 09 Jul 2004 21:18:56 -0700, tweak <xbwaichunasx@cox.net> wrote
> > in comp.lang.c:
> >
> >
> >>Keith Thompson wrote:
> >>
> >>>Frank Cusack <fcusack@fcusack.com> writes:
> >>>
> >>>
> >>>>On 9 Jul 2004 17:45:54 -0700 j0mbolar@engineer.com (j0mbolar) wrote:
> >>>>
> >>>>
> >>>>>A pointer to char is not guaranteed to be
> >>>>>properly aligned for a pointer to struct. In fact
> >>>>>I don't see how this could even work period, even on the
> >>>>>most perverse of systems. I'm right, right? This really
> >>>>>shouldn't work?
> >>>>
> >>>>Right! How can this work?
> >>>>
> >>>>For the folks who didn't get it:
> >>>>
> >>>> char buf[100];
> >>>> struct foo {
> >>>> long a;
> >>>> long b;
> >>>> char c;
> >>>> int d;
> >>>> };
> >>>> struct foo *foo_p = (struct foo *) buf;
> >>>>
> >>>>has the problem that buf has byte-alignment, whereas struct foo has
> >>>>long alignment.
> >>>>
> >>>>foo_p->a is not guaranteed to be referenceable. On some arches it would
> >>>>cause an unaligned read (slow), on some it just wouldn't work at all.
> >>>
> >>>
> >>>It's certainly true that foo_p isn't guaranteed to be aligned to any
> >>>boundary bigger than a byte. On the other hand, a declared array
> >>>object is likely to be word-aligned. (If you count on this, of1
> >>>course, your program will fail at the most inconvenient possible
> >>>moment.)
> >>>
> >>
> >>Now, you have confused me.
> >
> >
> > You are misinterpreting what Keith wrote.
> >
> >
> >>Why would an array be word-aligned (16 bits)? And a structure be
> >>byte-aligned (8 bits)?
> >
> >
> > The array might not be word-aligned. It might have an odd address.
> > When you cast the array by name to a pointer to struct, it would
> > retain that odd address.
>
> it == address of first element in the array?

Yes.

> Let me try to word how I am understanding what you are saying. The
> addresses within the struct mentioned above occupy memory addresses
> determined by their type. For example
>
> a b c d
> [4 bytes] [4 bytes] [1 byte] [2 bytes]

Actually there is a good chance that internal padding would be added
between the members of the structure, which C allows for just this
reason. To keep the discussion simple, assume a platform with 8-bit
bytes, 2-byte (16-bit) ints, and 4-byte (32-bit) longs. If it has
access requirements, that 2-byte int member d will need to be on an
even address. So assuming you define one of these structures and its
address is 0x1000 hex, the compiler will probably allocate it like
this:

   0x1000 - 0x1003 member a 4 bytes
   0x1004 - 0x1007 member b 4 bytes
   0x1008 - 0x1008 member c 1 byte
   0x1009 - 0x1009 padding 1 byte
   0x100A - 0x100B member d 2 bytes

The padding byte is required because d must start at an even address.
C allows the compiler to insert padding bytes between the members of a
structure, and after the last member, just not before the first
member. Padding is sometimes necessary at the end to make the size of
the entire structure a multiple of the alignment size.

Consider a processor that requires 4-byte, 32-bit longs be located on
an address divisible by 4. Now consider this structure definition and
how the compiler will pad it:

struct silly {
   char c1; /* relative address 0x0000 */
               /* padding 3 bytes 0x0001 - 0x0003 */
   long l1; /* relative address 0x0004 - 0x0007 */
   char c2; /* relative address 0x0008 */
               /* padding 3 bytes 0x0009 - 0x000b */
};

> In theory, the address pointed to would be the first item in the
> structure, right? Thus, the first item in the structure will always
> have an address divisible by 4. And the first item in a structure
> determines the alignment of the structure?

No, the member of the structure with the strictest alignment
requirement determines the alignment requirement of the entire
structure, as I tried to show in struct silly above. The structure
must have an alignment at least as strict as its strictest member, and
must have padding at the end, if necessary, to make the size of the
entire structure a multiple of this alignment size.

> So to guarantee that the typecast will work, the buf (array) size has to
> have the same divisibility as the first item in the structure?

Yes. That is why the memory allocation functions malloc(), calloc(),
and realloc() are guaranteed to return pointers to memory blocks
suitable aligned for any data type.

>
> Now, the array buf contains 100 bytes each of type char, so the location
> in memory of the first element in the array can have an odd address
> (address divisible by 1).
>
> So when buf is typecasted to struct foo *, the first item pointed to
> long a may not be aligned correctly since &buf[0] can have an odd
> address? Am I following what you are saying okay?

Yes.

> >>I reviewed the draft of C99. And I'm not sure what determines
> >>the boundaries.
> >
> >
> > The implementation determines the boundaries. They must conform to
> > what is physically possible with the underlying processor hardware,
> > but they might be more strict than the hardware requires. There are
> > quite a few modern RISC and DSP processors that have a general
> > requirement that types larger than one byte be aligned on addresses
> > that are a multiple of their size. They don't waste the extra
> > transistors on automatic circuitry to make slower access to unaligned
> > memory, they use them for more useful things and generate a fault or
> > just plain read or write the wrong memory instead.
>
> So char can have odd addresses, int can have addresses divisible by 2,
> long can have addresses divisible by 4 and so on?
>
> I will write a program tomorrow with intentional mis-alignment, so that
> I can debug it and see the mis-alignment. I hope I can do this with
> gdb.
>
> Brian
>
> P.S. The FAQ is brilliant. I wish I knew C as well as you.

What happens when you violate alignment requirements falls into the
category that the C standard calls undefined behavior. It is like
dividing by 0 in mathematics, you have broken the rules and there is
no correct answer.

In the actual hardware world there are three likely things that will
happen, depending on the type of the underlying processor hardware:

1. Example Intel Pentium. The processor will perform multiple memory
accesses, if necessary, and shift the bytes around between memory and
the processor core. The performance penalty can be quite severe.

2. Example ARM. The processor hardware generates a hardware
exception that causes the operating system (if there is one) to
terminate the misbehaving program.

3. Example Intel 8096. The misalignment is ignored, with incorrect
results. If you have a pointer of 0x21 and try to read or write a
16-bit int at address 0x21 and 0x22, the processor actually reads or
writes at the aligned address, bytes 0x20 and 0x21.

-- 
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html


Relevant Pages

  • Re: Portable Measurement of Pointer Alignment in C?
    ... 'pint' is a pointer sized integer type, on what proportion of systems will: ... Would you call the memory model "flat?" ... the practical portability of ways to determine the physical alignment of the ...
    (comp.lang.c)
  • Portable Measurement of Pointer Alignment in C?
    ... A lot of low level cryptographic code and some hardware cryptographic accelerators either fail completely or perform very poorly if their input, output and key storage areas in memory are not aligned on specific memory boundaries. ... in many situations the cryptographic code does not itself have any control over the memory alignment of its parameters so the best it can do is to detect if these aligmments are correct and act accordingly. ... This hence rasises the question of the most appropriate way of determining the aligment of a pointer. ...
    (comp.lang.c.moderated)
  • Re: Portable Measurement of Pointer Alignment in C?
    ... output and/or key storage areas in memory are not aligned on specific memory ... the memory alignment of the memory referenced by a pointer in C. ... on systems that allow for the declaration of ...
    (comp.lang.c)
  • Re: pointer changes value between calls, why/how?
    ... You are returning a pointer to an auto variable. ... What memory allocated on the heap? ... the variable 'member' was *NOT* allocated with malloc. ... If you allocate it with ...
    (comp.lang.c)
  • Re: minor confusion
    ... on that pointer which frees the memory. ... the same block of memory pointed to by the "data" member in "t" because both ... doesn't actually store any memory itself, merely the address of that memory. ...
    (alt.comp.lang.learn.c-cpp)