Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?

From: Joe Wright (joewwright_at_comcast.net)
Date: 12/19/04


Date: Sun, 19 Dec 2004 14:08:38 -0500

Eric Sosman wrote:
> Joe Wright wrote:
>
>> Eric Sosman wrote:
>>
>>> Joe Wright wrote:
>>>
>>>> while ((c = *s++)) if (!isspace(c)) p = s;
>>>
>>>
>>> while ((c = *s++)) if (!isspace((unsigned char)c)) p = s;
>>>
>>> ... or else you can be in deep trouble if `char' is signed.
>>
>>
>> From N869 ..
>>
>> #include <ctype.h>
>> int isspace(int c);
>>
>> Description
>>
>> [#2] The isspace function tests for any character that is a
>> standard white-space character or is one of a locale-
>> specific set of characters for which isalnum is false. The
>> standard white-space characters are the following: space
>> (' '), form feed ('\f'), new-line ('\n'), carriage return
>> ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). In
>> the "C" locale, isspace returns true only for the standard
>> white-space characters.
>>
>> The descriptions of the ctype functions all take int values. I know
>> that char is converted to int in this case and that if char is signed
>> and negative, the result is probably a negative int.
>
>
> ... but they don't take "just any" int values; the
> argument must be in a restricted range. 7.4, paragraph 1
> (I don't have N869 so this is from ISO/IEC 9899:1999,
> which is very nearly as good):
>
> "In all cases the argument is an int, the value of
> which shall be representable as an unsigned char or
> shall equal the value of the macro EOF. If the
> argument has any other value, the behavior is
> undefined."
>
>> So what? Clearly -50 is not space or form feed, tab, etc. and the
>> expression (isspace(-50) == 0) is true.
>
>
> isspace(-50) produces undefined behavior unless EOF==-50.
>

What is EOF for in this context? I'm not overly afraid of 'Undefined
Behavior'. isspace(c) is required to return 0 if c (now converted to
int) is not among the 'space' characters. Clearly EOF is not among
the 'space' characters and so 0 must be the result. Right?

>> What is the case for casting this otherwise negative int to unsigned
>> char? What 'deep trouble' could happen if I didn't? Why wouldn't the
>> function be written so as to take any int as advertised?
>
>
> Well, "deep trouble" may have been an overstatement on my
> part. Undefined behavior, by its very undefinedness, can be
> beneficial rather than harmful. Who knows? The experience of
> having demons fly out of your nose may be pleasant. ;-)
>
> As to why the functions require a restricted range, I can
> think of two likely reasons:
>
> - For speed, the functions are frequently implemented as
> macros that do simple array references. isspace() and its
> kin just take the argument value, subtract EOF, and use the
> difference as an index to an array containing the precomputed
> answer. If the argument range were unrestricted, you'd need
> an array with INT_MAX-INT_MIN+1 elements, which even with
> today's enormous memories would be excessive. A range check
> could be introduced, but this is difficult to do in a macro.
>

No, you don't. EOF is a non-event (must return 0) and (c && 0xff)
will give you the index into a 256-byte array of answers to the
questions.

> - Even with a different implementation strategy you face an
> ambiguity when the argument equals EOF: Is it end-of-file or
> a legitimate character (e.g., 0xFF on many systems)? Given
> the value alone there is no way to tell. The Standard requires
> that the legitimate characters be passed as non-negative values
> so they can be distinguished from the negative value EOF.
>

The Standard requirements for non-negative notwithstanding, having
checked the value for EOF and finding that it is not, mask the value
with 0xff and carry on. Surely.

> IMHO this is one of those unpleasant little corners in the
> language. It seems to me things would have been simpler had
> `char' been synonymous with `unsigned char' right from the
> start. However, machines disagree on just what should happen
> when a byte is fetched from memory into a wider CPU register
> for further manipulation: Some machines widen by sign-extending,
> some by zero-extending, and some by leaving the pre-existing
> high-order register contents unchanged. Requiring `unsigned char'
> on all these types of machines (and on others I haven't thought
> of) would have imposed a burden of extra instructions on at
> least some of them.
>

The Standard's mention of 'unsigned char' in this context is
unfortunate. We are talking about values of an int.

> And even a universal `unsigned char' would be no panacea.
> I have heard tell of machines with 32-bit characters and 32-bit
> integers, and I imagine the proper choice of an EOF value on such
> machines must involve ugly compromises.
>

I think it's a question of domains within a range. For 32-bit
unsigned integers, the range of values is 0..4,294,967,295. NULL
defined as 0 is within the domain of pointers and EOF as -1 is
outside the domain of characters. Good choices.

-- 
Joe Wright                            mailto:joewwright@comcast.net
"Everything should be made as simple as possible, but not simpler."
                     --- Albert Einstein ---


Relevant Pages

  • Re: scanf behaviour
    ... char, i have to reread the input until I get the needed pos. ... user input a number that's too large to be stored in an integer. ... static int ignoreblks ... which may be \n or EOF ...
    (comp.lang.c)
  • Re: converting std::basic_string to upper or lower case.
    ... >> upper case characters were coming back messed up. ... > 0-127 + EOF). ... So if you pass a negative char value it isn't going to be ... >> nagging suspicions it's not and that only the memory returned from ...
    (microsoft.public.vc.stl)
  • Re: memory leak?
    ... char, short, int are all 16 bits. ... them rely on EOF being returned by the function. ... value distinct from all unsigned char values. ...
    (microsoft.public.vc.mfc)
  • Re: huffman encoder
    ... > to the specified stream and advances the position indicator for the ... > the error indicator for the stream is set and EOF is returned. ... the whole damn int does not go to the file, only the byte value of the int. ... flushing the, eg, so the last char is not truncated. ...
    (comp.compression)
  • Re: huffman encoder
    ... > to the specified stream and advances the position indicator for the ... > the error indicator for the stream is set and EOF is returned. ... the whole damn int does not go to the file, only the byte value of the int. ... flushing the, eg, so the last char is not truncated. ...
    (comp.compression)