Re: Defacto standard string library
- From: Phil Carmody <thefatphil_demunged@xxxxxxxxxxx>
- Date: Sat, 03 Jan 2009 23:04:02 +0200
"Bartc" <bartc@xxxxxxxxxx> writes:
"Phil Carmody" <thefatphil_demunged@xxxxxxxxxxx> wrote in message
news:87eizk5fr0.fsf@xxxxxxxxxxxxxxxxxxxxxxx
richard@xxxxxxxxxxxxxxx (Richard Tobin) writes:
In article <ANN7l.14634$Sp5.13522@xxxxxxxxxxxxxxxxxxxxxxxxx>,
Bartc <bartc@xxxxxxxxxx> wrote:
strcmp() will only work on UTF-8 if you make use of the result as either
0
or not 0.
No, it will give Unicode ordering.
And if you use strcmp() on mixed UTF-8 and ordinary strings, then
the result might be meaningless (a string containing a single encoded
Unicode Character could match a string of several ordinary chars).
If you use strcmp() between strings in different encodings of course
the result is likely to be meaningless. However UTF-8 has the advantage
that it can be compared against ascii, since ascii is a subset of UTF-8.
How does "\xEF\xBB\xBF\x40" compare against "\x41" using strcmp()?
Apparently the EF BB BF 40 sequence would be invalid UTF-8 (because it's not
the shortest way of encoding x40).
It's the first line I read from the UTF-8 encoded file that I just
fopen()ed. "\x41" was the first line I read from the ASCII encoded
file that I also just fopen()ed. How do these two lines compare?
You cannot demand that I unconditionally drop any "\xEF\xBB\xBF"
from the first line of a file before performing the comparison. Were
you to do so, you'd bugger any ISO 8859-15 file beginning "".
People keep saying UTF-8 is compatible with all these string functions but
I'm not too happy about it myself. The functions aren't used in isolation
and a lot of user code(existing and future) needs to be aware of pitfalls.
UTF-8 strings, as sequences of Unicode characters, aren't arrays.
Anything which treats them as arrays can potentially have pitfalls.
So it's not just the str*()s that are the problem.
Phil
--
I tried the Vista speech recognition by running the tutorial. I was
amazed, it was awesome, recognised every word I said. Then I said the
wrong word ... and it typed the right one. It was actually just
detecting a sound and printing the expected word! -- pbhj on /.
.
- Follow-Ups:
- Re: Defacto standard string library
- From: Keith Thompson
- Re: Defacto standard string library
- From: Stephen Sprunk
- Re: Defacto standard string library
- References:
- Re: Defacto standard string library
- From: user923005
- Re: Defacto standard string library
- From: Keith Thompson
- Re: Defacto standard string library
- From: Bartc
- Re: Defacto standard string library
- From: Richard Tobin
- Re: Defacto standard string library
- From: Phil Carmody
- Re: Defacto standard string library
- From: Bartc
- Re: Defacto standard string library
- Prev by Date: Re: Defacto standard string library
- Next by Date: Re: how to avoid mistaken integer comparisons
- Previous by thread: Re: Defacto standard string library
- Next by thread: Re: Defacto standard string library
- Index(es):
Relevant Pages
|