Re: I want unsigned char * string literals
- From: Michael B Allen <ioplex@xxxxxxxxx>
- Date: Mon, 23 Jul 2007 12:53:33 -0400
On Mon, 23 Jul 2007 09:02:04 -0400
Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxxx> wrote:
Michael B Allen wrote:
[...]
Even though char is wrong, it's just another little legacy wart with
no serious technical impact other than the fact that to inspect bytes
within the text one should cast to unsigned char first. [...]
It is unnecessary to cast anything in order to "inspect"
a character in a string. *cptr == 'A' and *cptr == 'ß' work
just fine (on systems that have a ß character), and there's
no need to cast either *cptr or the constant.
Hi Eric,
The above code will not work with non-latin1 character encodings (most
importantly UTF-8). That will severely limit it's portability from an i18n
perspective (e.g. no CJK). And even domestically you're going to run into
trouble soon. Standards related to Kebreros, LDAP, GSSAPI and many more
are basically saying they don't care about codepages anymore. Everything
is going to be UTF-8 (except on Windows which will of course continue
to use wchar_t).
Perhaps you're unhappy about the casting that *is* needed
for the <ctype.h> functions, and I share your unhappiness.
But that's not really a consequence of the sign ambiguity of
char; rather, it follows from the functions' having a domain
consisting of all char values *plus* EOF. Were it not for the
need to handle EOF -- a largely useless addition, IMHO -- there
would be no need to cast when using <ctype.h>.
Forget casting, the ctype functions don't even work at all if the high
bit is on. Ctype only works with ASCII.
However, that's far from the worst infelicity in the C
library. The original Standard tried (mostly) to codify
C-as-it-was, not to replace it with C-remade-in-trendy-mode.
The <ctype.h> functions -- and their treatment of EOF -- were
already well-established before the first Standard was written,
and the writers had little choice but to accept them.
Ok. A little history is nice. But I really think these discussions
should be punctuated with saying that the C standard library is basically
useless at this point.
ctype - useless for i18n
errno - a classic non-standard standard
locale - no context object so it can't be safely used in libraries
setjmp - not portable
signal - no comment necessary
stdio - no context object to keep state separate (e.g. can't mix wide
and non-wide I/O)
stdlib - malloc has no context object
string - useless for i18n
If we're ever going to create a new "standard" library for C the first
step is to admit that the one we have now is useless for anything but
hello world programs.
Mike
.
- Follow-Ups:
- Re: I want unsigned char * string literals
- From: Ben Pfaff
- Re: I want unsigned char * string literals
- From: Keith Thompson
- Re: I want unsigned char * string literals
- From: Richard Heathfield
- Re: I want unsigned char * string literals
- From: Eric Sosman
- Re: I want unsigned char * string literals
- References:
- I want unsigned char * string literals
- From: Michael B Allen
- Re: I want unsigned char * string literals
- From: pete
- Re: I want unsigned char * string literals
- From: Michael B Allen
- Re: I want unsigned char * string literals
- From: Eric Sosman
- I want unsigned char * string literals
- Prev by Date: Re: Pointer Question
- Next by Date: Re: Pointer Question
- Previous by thread: Re: I want unsigned char * string literals
- Next by thread: Re: I want unsigned char * string literals
- Index(es):
Relevant Pages
|
Loading