Re: Bit-fields and integral promotion

From: Jack Klein (jackklein_at_spamcop.net)
Date: 01/29/05


Date: Sat, 29 Jan 2005 16:02:15 -0600

On Sat, 29 Jan 2005 11:26:02 GMT, CBFalconer <cbfalconer@yahoo.com>
wrote in comp.std.c:

> Jack Klein wrote:
> > CBFalconer <cbfalconer@yahoo.com>
> >> Kevin Bracey wrote:
> >>> "TTroy" <tinesan@gmail.com> wrote:
> >>>> CBFalconer wrote:
> >>>>> Wojtek Lerch wrote:
> >>>>>> CBFalconer wrote:
>
> >> Maybe the way to attack it is by how any sane code generator
> >> designer would go about it. The first thing to do is to get the
> >> memory block holding the thing in question into a register. The
> >> next is to shift that register so that the field in question is
> >> right justified. Now the question arises of what to do with the
> >> unspecified bits. They may either be masked off to 0 (i.e. the
> >> field was unsigned) or jammed to copies of the left hand bit of the
> >> original field (i.e. the field was signed, assuming 2's
> >> complement). For 1's complement things are the same here, and for
> >> sign-magnitude different (closer to unsigned treatment, but move
> >> the sign bit over to its proper place).
> >>
> >> After that we have an entity in a register, which is assumedly the
> >> most convenient size that is normally used for ints, signed or
> >> unsigned, and we can proceed from there. It seems to me that most
> >> designers would opt for the unsigned version, because it is simpler
> >> to process, and they are allowed to.
> >
> > Up to here, you're doing OK.
> >
> >> That means that the signed/unsigned characteristic of the bit field
> >> is propagated into any expressions using it.
> >
> > Now you've stumbled.
> >
> > Think about it, you have just copied a storage unit full of bits
> > into a register, and perhaps right shifted that register to place
> > the bit field in the least significant bits of that register.
> > Because the bit field is defined as unsigned, you fill all the
> > higher bits of the registers with 0, most likely with a bitwise
> > AND. If the bit field had been signed and contained a positive
> > value, you would have done the same.
>
> No, you've missed the complications involved in assuming the bit
> field to be signed. That means the other bits have to be set to
> copies of the fields sign bit, in either 1's or 2's complement
> machines. For sign magnitude the appropriate bit has to be
> exchanged with the sign bit, after zeroing the extra bits. These
> manipulations are more complex than the unsigned version (which
> simply zeroes some bits), and thus to be avoided. Laziness is a
> virtue here.

The gyrations involved in sign extending negative unsigned bit fields
on non 2's complement platforms are relevant, and no different than
those such a platform must go through to convert an ordinary signed
integer type with a negative value to a wider signed type. If there
are any such monstrosities still in existence with current C compiler
support, they pay for the obsolete architecture.

> In all cases we now have a bit pattern in a register, and external
> type knowledge saying whether that pattern describes a signed or
> unsigned integer. That external knowledge comes from the original
> declaration of the bit field. That knowledge also governed whether
> or not to go through the sign-extension gyrations described above.
>
> All further processing is done as if the reworked register content
> had been loaded in one fell swoop from somewhere, together with the
> un/signed type knowledge.

What does the "signed/unsigned" knowledge have to do with it? On most
architectures, chars have fewer bits than ints and UCHAR_MAX <
INT_MAX. So in an expression involving an unsigned char, you wind up
with the same thing, namely a narrower bit field filling an int size
object, and the knowledge of whether the object it came from was
signed or unsigned. Despite the fact that the value originated in an
unsigned char, the int-sized object must be treated as signed.

> Having gotten here with our sane code generator implementor, I
> maintain we now have the right clue about how to handle the usual
> arithmetic conversions on the bit field. We now base them on the
> original declaration as un/signed, because that minimizes the
> work. This is the final clue as to what the standard should say,
> were it to say anything, which so far it does not AFAICT. We do
> not base it on the range of values the bit field can hold.

Admittedly it is unfortunate that the standard does not specifically
mention bit fields in describing the usual integer conversions, and
hopefully that can be rectified in a TC or later version.

But since the standard selected what they call "value preserving" over
"sign preserving" operation, it would be seriously inconsistent and a
real source of problems if an 8-bit unsigned char promoted to a signed
int but an 8-bit unsigned bit field promoted to an unsigned int. That
would be rather absurd, don't you think?

> ... snip ...
> >
> > The programmer should used unsigned bit fields when only positive
> > values need to be stored, and signed bit fields when both positive
> > and negative values are used.
>
> We are not trying to constrain the programmer, we are trying to
> interpret what s/he actually wrote.

Ah, you snipped your particular statement that my comment addressed,
so I am putting it back in:

> It also means that wherever given the choice, the designer will
> make a bit field unsigned because it means less processing and less
> chance of overflows and consequent UB.

I misinterpreted your meaning, so my comment doesn't apply. I was
thrown off by what I think is some incompleteness in your wording. I
think what you meant to say by "make a bit field unsigned" would be
better conveyed by the words "make an unsigned bit field promote to
unsigned int".

But despite the omission from the standard, it seems silly to think
that the compiler designer is given a choice here. Since all other
promotions and conversions are rather scrupulously defined, I find it
hard to believe that the intent was to leave the results of using an
unsigned bit field in an expression implementation defined. In fact,
nothing is implementation-defined unless the standard specifically
states that it is implementation-defined.

In fact, given the lack of mention, using the value of a bit field in
an expression technically produces undefined behavior based on the
wording of the standard today.

-- 
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html


Relevant Pages

  • Re: Safest way to convert chars to ints
    ... |> up into the vector of unsigned char. ... | int main ... I know - That's why I like the stringstream version the best. ... | become an international standard. ...
    (alt.comp.lang.learn.c-cpp)
  • Re: value of the constant expression 1<<(1?1:1) < 0x9999
    ... The behavior is inconsistent with the C standard. ... for the C language", without mention of version, so I guess C89. ... This converts the left operand, 2, from int to unsigned int. ... unsigned char, yielding 1. ...
    (comp.lang.c)
  • Is pointer arithmetic associative?
    ... Are these programs correct? ... int main{ ... future C standard to waive these restrictions on pointer ...
    (comp.lang.c)
  • Is pointer arithmetic associative?
    ... Are these programs correct? ... int main{ ... future C standard to waive these restrictions on pointer ...
    (comp.lang.c)
  • Re: Portability: Harmony between PC and microcontroller
    ... int is the natural integer type for the system. ... You are, perhaps unintentionally, paraphrasing the standard in a way ... One of the things that you might not realize is that the C programming ... In the real world, most embedded systems have more complex jobs to do, ...
    (comp.lang.c)