Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?
From: Joe Wright (joewwright_at_comcast.net)
Date: 12/19/04
- Next message: Martin Ambuhl: "Re: C call of a C# dll"
- Previous message: Leon Brodskiy: "Re: pointer and array"
- In reply to: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Next in thread: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Reply: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 19 Dec 2004 14:08:38 -0500
Eric Sosman wrote:
> Joe Wright wrote:
>
>> Eric Sosman wrote:
>>
>>> Joe Wright wrote:
>>>
>>>> while ((c = *s++)) if (!isspace(c)) p = s;
>>>
>>>
>>> while ((c = *s++)) if (!isspace((unsigned char)c)) p = s;
>>>
>>> ... or else you can be in deep trouble if `char' is signed.
>>
>>
>> From N869 ..
>>
>> #include <ctype.h>
>> int isspace(int c);
>>
>> Description
>>
>> [#2] The isspace function tests for any character that is a
>> standard white-space character or is one of a locale-
>> specific set of characters for which isalnum is false. The
>> standard white-space characters are the following: space
>> (' '), form feed ('\f'), new-line ('\n'), carriage return
>> ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). In
>> the "C" locale, isspace returns true only for the standard
>> white-space characters.
>>
>> The descriptions of the ctype functions all take int values. I know
>> that char is converted to int in this case and that if char is signed
>> and negative, the result is probably a negative int.
>
>
> ... but they don't take "just any" int values; the
> argument must be in a restricted range. 7.4, paragraph 1
> (I don't have N869 so this is from ISO/IEC 9899:1999,
> which is very nearly as good):
>
> "In all cases the argument is an int, the value of
> which shall be representable as an unsigned char or
> shall equal the value of the macro EOF. If the
> argument has any other value, the behavior is
> undefined."
>
>> So what? Clearly -50 is not space or form feed, tab, etc. and the
>> expression (isspace(-50) == 0) is true.
>
>
> isspace(-50) produces undefined behavior unless EOF==-50.
>
What is EOF for in this context? I'm not overly afraid of 'Undefined
Behavior'. isspace(c) is required to return 0 if c (now converted to
int) is not among the 'space' characters. Clearly EOF is not among
the 'space' characters and so 0 must be the result. Right?
>> What is the case for casting this otherwise negative int to unsigned
>> char? What 'deep trouble' could happen if I didn't? Why wouldn't the
>> function be written so as to take any int as advertised?
>
>
> Well, "deep trouble" may have been an overstatement on my
> part. Undefined behavior, by its very undefinedness, can be
> beneficial rather than harmful. Who knows? The experience of
> having demons fly out of your nose may be pleasant. ;-)
>
> As to why the functions require a restricted range, I can
> think of two likely reasons:
>
> - For speed, the functions are frequently implemented as
> macros that do simple array references. isspace() and its
> kin just take the argument value, subtract EOF, and use the
> difference as an index to an array containing the precomputed
> answer. If the argument range were unrestricted, you'd need
> an array with INT_MAX-INT_MIN+1 elements, which even with
> today's enormous memories would be excessive. A range check
> could be introduced, but this is difficult to do in a macro.
>
No, you don't. EOF is a non-event (must return 0) and (c && 0xff)
will give you the index into a 256-byte array of answers to the
questions.
> - Even with a different implementation strategy you face an
> ambiguity when the argument equals EOF: Is it end-of-file or
> a legitimate character (e.g., 0xFF on many systems)? Given
> the value alone there is no way to tell. The Standard requires
> that the legitimate characters be passed as non-negative values
> so they can be distinguished from the negative value EOF.
>
The Standard requirements for non-negative notwithstanding, having
checked the value for EOF and finding that it is not, mask the value
with 0xff and carry on. Surely.
> IMHO this is one of those unpleasant little corners in the
> language. It seems to me things would have been simpler had
> `char' been synonymous with `unsigned char' right from the
> start. However, machines disagree on just what should happen
> when a byte is fetched from memory into a wider CPU register
> for further manipulation: Some machines widen by sign-extending,
> some by zero-extending, and some by leaving the pre-existing
> high-order register contents unchanged. Requiring `unsigned char'
> on all these types of machines (and on others I haven't thought
> of) would have imposed a burden of extra instructions on at
> least some of them.
>
The Standard's mention of 'unsigned char' in this context is
unfortunate. We are talking about values of an int.
> And even a universal `unsigned char' would be no panacea.
> I have heard tell of machines with 32-bit characters and 32-bit
> integers, and I imagine the proper choice of an EOF value on such
> machines must involve ugly compromises.
>
I think it's a question of domains within a range. For 32-bit
unsigned integers, the range of values is 0..4,294,967,295. NULL
defined as 0 is within the domain of pointers and EOF as -1 is
outside the domain of characters. Good choices.
--
Joe Wright mailto:joewwright@comcast.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
- Next message: Martin Ambuhl: "Re: C call of a C# dll"
- Previous message: Leon Brodskiy: "Re: pointer and array"
- In reply to: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Next in thread: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Reply: Eric Sosman: "Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|