Re: Defacto standard string library



In article <ANN7l.14634$Sp5.13522@xxxxxxxxxxxxxxxxxxxxxxxxx>,
Bartc <bartc@xxxxxxxxxx> wrote:

strcmp() will only work on UTF-8 if you make use of the result as either 0
or not 0.

No, it will give Unicode ordering.

And if you use strcmp() on mixed UTF-8 and ordinary strings, then
the result might be meaningless (a string containing a single encoded
Unicode Character could match a string of several ordinary chars).

If you use strcmp() between strings in different encodings of course
the result is likely to be meaningless. However UTF-8 has the advantage
that it can be compared against ascii, since ascii is a subset of UTF-8.

What I'm saying is that I think it's a bad idea to use C string functions on
strings known to contain UTF-8.

It's a bad idea to use functions that interpret the characters in the
string, and functions that expect the characters to be one byte. But
most of the str* functions don't have those problems.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
.



Relevant Pages

  • Re: Defacto standard string library
    ... Unicode Character could match a string of several ordinary chars). ... the result is likely to be meaningless. ... that it can be compared against ascii, since ascii is a subset of UTF-8. ... People keep saying UTF-8 is compatible with all these string functions but ...
    (comp.lang.c)
  • Re: opening a file
    ... If used as a string rather than a number, ... yields the current value of the C "errno" ... might be meaningless. ... ref Returns a non-empty string if EXPR is a reference, ...
    (comp.lang.perl.misc)
  • Re: Defacto standard string library
    ... Unicode Character could match a string of several ordinary chars). ... the result is likely to be meaningless. ... that it can be compared against ascii, since ascii is a subset of UTF-8. ...
    (comp.lang.c)
  • Re: STL Slow - VS2005
    ... tests and only glanced at the test scenario, ... The first and most crucial question to ask of any benchmark is how ... it can still be entirely meaningless (if the ... People have done quite a few studies on string lengths as they're ...
    (microsoft.public.vc.stl)
  • Re: Defacto standard string library
    ... it will give Unicode ordering. ... Unicode Character could match a string of several ordinary chars). ... the result is likely to be meaningless. ... I tried the Vista speech recognition by running the tutorial. ...
    (comp.lang.c)