Re: when is typecasting (unsigned char*) to (char*) dangerous?

From: Eric Sosman (Eric.Sosman_at_sun.com)
Date: 06/21/04


Date: Mon, 21 Jun 2004 17:53:48 -0400

b83503104 wrote:
> When are they not consistent?

     (In the future, please make sure your entire question
appears in the body of your message. Use the Subject: header
as a synopsis of your question, but do not rely on it to
carry your message unaided.)

     Putting the Subject: and the body together, we have
this question:

> when is typecasting (unsigned char*) to (char*) dangerous?
> When are they not consistent?

     The conversion itself is not dangerous. However, it
could be dangerous to use the resulting `char*' to access
the pointed-to characters. In theory, at least, some bit
patterns that are valid as `unsigned char' might be invalid
when considered as `char' -- that is, there might be no
actual `char' value corresponding to the bit pattern.

     This concern is mostly theoretical, and would apply to
"exotic" architectures that use signed magnitude or ones'
complement representation for negative numbers and choose to
treat `char' as a signed type. On such systems there are two
distinct representations of zero (100...0 in binary notation
for S.M, or 111...1 for O.C.). When viewed as `unsigned char'
these bit patterns are easily distinguished from 000...0 --
but when viewed as `char', the "minus zero" is indistinguishable
from "positive zero." Even worse, the alternate forms could be
treated as "trap representations" and could cause your program
to misbehave.

     Such concerns do not arise on the two's complement machines
that are prevalent nowadays. Nothing bad will happen when you
convert the `unsigned char*' to `char*', and nothing bad will
happen when you use the `char*' to inspect the bytes. (If an
expert disputes this assertion and mentions "padding bits" or
"trap representations," pay him no heed until and unless he
can exhibit a system whose `char' representation has such things.
If he says the word "DeathStar" or the abbreviation "DS," he's
just trying to scare you.)

     Nonetheless, you must still be vigilant: `char' is unsigned
on some two's complement machines and signed on others. If you
use the `char*' to fetch a `char' value and then index an array
with the fetched value, you may find yourself trying to access
`crc_table[-128]', and this is not likely to be good for your
program's prospects of forward progress. Fetch a `char' value
and start right-shifting it until all the 1-bits "fall off the
end," and you may find yourself in an infinite loop. Beware!

     There are some situations where type-punning is guaranteed
to be safe. Given a pointer to any data object, you can safely
convert that pointer to `unsigned char*' and then inspect the
individual bytes of the object. You can safely convert between
a struct pointer and a pointer to its first element, or between
a union pointer and a pointer to any of its elements, or between
any data pointer at all and a `void*'. Sometimes, conversions
of this kind are essential -- but if you find yourself writing
"a lot" of them, it's probably a sign that your data structures
are not well-designed.

-- 
Eric.Sosman@sun.com


Relevant Pages

  • Re: Problem with va_ macros and arrays of arrays
    ... > the arrays passed to a ... > specific char, somewhat similar to what the standard function ... that with an array of struct, or possibly a pointer to a dynamic array ... > As I'm still a beginner in C without a copy of the standard I ...
    (comp.lang.c)
  • Re: Insufficient guarantees for null pointers?
    ... will the compiler know what the bounds are after converting that char * ... to an int *, if it could point to either of two arrays which happen to ... compares equal to the original pointer. ...
    (comp.std.c)
  • Re: Pointer equality and dereferencing
    ... behaviour are allowed to have identical representations. ... But I'm talking about two values that have the same type and identical representations but one points to an object and the other to one past the end of an array. ... Their actual behaviour is likely to be identical on most implementations, but the guarantees that the standard makes about their behaviour are not. ... What I have a problem with is the idea of allowing a compiler to track down the origin of the bytes that make up a pointer in the reconstructed structure, and to decide that dereferencing that pointer must be undefined behaviour because the bytes were copied from a different spot in the structure that happened to contain a pointer to one past the end of some array. ...
    (comp.std.c)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... programmer is "free", o happy day, to use the stack copy, test itself, ... is the pointer that is being passed as the value ... static int matchhere(char *regexp, char *text) ...
    (comp.programming)
  • Re: Pointer initialization.
    ... getting at a character that's on an odd address involves ... The pointer itself doesn't need extra "helper/magic" bits. ... pointers to char and therefore pointers to void ... On my Sparc, If I build a 64-bit application, it's possible ...
    (comp.lang.c)