Re: I want unsigned char * string literals



On Mon, 23 Jul 2007 09:02:04 -0400
Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxxx> wrote:

Michael B Allen wrote:
[...]

Even though char is wrong, it's just another little legacy wart with
no serious technical impact other than the fact that to inspect bytes
within the text one should cast to unsigned char first. [...]

It is unnecessary to cast anything in order to "inspect"
a character in a string. *cptr == 'A' and *cptr == 'ß' work
just fine (on systems that have a ß character), and there's
no need to cast either *cptr or the constant.

Hi Eric,

The above code will not work with non-latin1 character encodings (most
importantly UTF-8). That will severely limit it's portability from an i18n
perspective (e.g. no CJK). And even domestically you're going to run into
trouble soon. Standards related to Kebreros, LDAP, GSSAPI and many more
are basically saying they don't care about codepages anymore. Everything
is going to be UTF-8 (except on Windows which will of course continue
to use wchar_t).

Perhaps you're unhappy about the casting that *is* needed
for the <ctype.h> functions, and I share your unhappiness.
But that's not really a consequence of the sign ambiguity of
char; rather, it follows from the functions' having a domain
consisting of all char values *plus* EOF. Were it not for the
need to handle EOF -- a largely useless addition, IMHO -- there
would be no need to cast when using <ctype.h>.

Forget casting, the ctype functions don't even work at all if the high
bit is on. Ctype only works with ASCII.

However, that's far from the worst infelicity in the C
library. The original Standard tried (mostly) to codify
C-as-it-was, not to replace it with C-remade-in-trendy-mode.
The <ctype.h> functions -- and their treatment of EOF -- were
already well-established before the first Standard was written,
and the writers had little choice but to accept them.

Ok. A little history is nice. But I really think these discussions
should be punctuated with saying that the C standard library is basically
useless at this point.

ctype - useless for i18n
errno - a classic non-standard standard
locale - no context object so it can't be safely used in libraries
setjmp - not portable
signal - no comment necessary
stdio - no context object to keep state separate (e.g. can't mix wide
and non-wide I/O)
stdlib - malloc has no context object
string - useless for i18n

If we're ever going to create a new "standard" library for C the first
step is to admit that the one we have now is useless for anything but
hello world programs.

Mike
.



Relevant Pages

  • Re: use of backward single quote in procedure names, was: DST (summer time) offset
    ... char *dlistFile; ... thing that gets you beyond the standard C libraries. ... topic (RISC OS group, RISC OS language discussion) and I'll bother to ...
    (comp.sys.acorn.programmer)
  • Re: Destructor: not gauranteed to be called?
    ... >>> the ToStringmethod to a Char[] it returns with the EXACT SAME ... >>> the standard and might break someone's existing code. ... ToString is not part of the C++ ... The fact yhat you require or expect a Charto act as a string is a sign ...
    (microsoft.public.dotnet.languages.vc)
  • Re: wrong print
    ... you must be using non-standard libraries. ... int max_line_len = 1024; char **Amm,**Pss; ... char* readline; void scandir; ... Before posting for the first time to a group ALWAYS read the FAQ ...
    (comp.lang.c)
  • Re: byte order
    ... int am_i_little_endian{ ... Are you *absolutely* sure that's standard C, ... assuming a 8-bit char). ... The pointer casting rules specify that "It is guaranteed that ...
    (comp.lang.lisp)
  • Re: An Observation
    ... You execute it now or you execute it later. ... I explained CHAR and as I did in my prior post in ... complexity around and was not actually any simpler. ... because of the sloppiness of the standard. ...
    (comp.lang.forth)

Loading