Re: Defacto standard string library
- From: "Bartc" <bartc@xxxxxxxxxx>
- Date: Sat, 03 Jan 2009 19:57:49 GMT
"Phil Carmody" <thefatphil_demunged@xxxxxxxxxxx> wrote in message
news:87eizk5fr0.fsf@xxxxxxxxxxxxxxxxxxxxxxx
richard@xxxxxxxxxxxxxxx (Richard Tobin) writes:In article <ANN7l.14634$Sp5.13522@xxxxxxxxxxxxxxxxxxxxxxxxx>,
Bartc <bartc@xxxxxxxxxx> wrote:
strcmp() will only work on UTF-8 if you make use of the result as either
0
or not 0.
No, it will give Unicode ordering.
And if you use strcmp() on mixed UTF-8 and ordinary strings, then
the result might be meaningless (a string containing a single encoded
Unicode Character could match a string of several ordinary chars).
If you use strcmp() between strings in different encodings of course
the result is likely to be meaningless. However UTF-8 has the advantage
that it can be compared against ascii, since ascii is a subset of UTF-8.
How does "\xEF\xBB\xBF\x40" compare against "\x41" using strcmp()?
Apparently the EF BB BF 40 sequence would be invalid UTF-8 (because it's not
the shortest way of encoding x40).
People keep saying UTF-8 is compatible with all these string functions but
I'm not too happy about it myself. The functions aren't used in isolation
and a lot of user code(existing and future) needs to be aware of pitfalls.
--
Bartc
.
- Follow-Ups:
- Re: Defacto standard string library
- From: Phil Carmody
- Re: Defacto standard string library
- References:
- Re: Defacto standard string library
- From: user923005
- Re: Defacto standard string library
- From: Keith Thompson
- Re: Defacto standard string library
- From: Bartc
- Re: Defacto standard string library
- From: Richard Tobin
- Re: Defacto standard string library
- From: Phil Carmody
- Re: Defacto standard string library
- Prev by Date: Re: A bit of fun. A programming puzzle to be done in C.
- Next by Date: Stats for comp.lang.c (last 7 days)
- Previous by thread: Re: Defacto standard string library
- Next by thread: Re: Defacto standard string library
- Index(es):
Relevant Pages
|