Need help on string manipulation



Hello, I'm currently learning string manipulation. I'm curious about
what is the favored way for string manipulation in C, expecially when
strings contain non-ASCII characters. For example, if substrings need
be replaced, or one character needs be changed, what shall I do? Is it
better to convert strings to UCS-32 before manipulation?

But on Windows, wchar_t is 16 bits which isn't enough for characters
which can't be simply encoded using 16 bits.

On Linux, I hear wchar_t is 32 bit. Maybe on Linux, strings can be
simply converted to wchar_t and then handle them without worrying? I'm
not sure.

What is a "good" way to handle all this mess? Are there any good
examples? I'll be very thankful for your help.

.



Relevant Pages

  • Re: Why R6RS is controversial
    ... the semantics of the language, ... behavior of grapheme-cluster characters under most linguistic ... as the strings grow longer. ... Normalization is hideously complicated, and may require many ...
    (comp.lang.scheme)
  • Re: Unicode LISP??
    ... I'm not experienced with Common Lisp library, ... terms of strings rather than characters. ... have their representation upgraded if they are updated in place. ...
    (comp.lang.lisp)
  • Re: not quite 1252
    ... The kill_gremlins function is intended to fix Unicode strings that have been obtained by decoding 8-bit strings using 'latin1' instead of 'cp1252'. ... In fact it wasn't, it was UTF-8 like Sergei wrote, but it was easy to convert it to cp1252, no problem. ... characters to documents marked up as ISO 8859-1 or other encodings. ...
    (comp.lang.python)
  • Re: How to check variables for uniqueness ?
    ... FI in English typography), so the correct uppercase version of those ... characters is the sequence SS. ... So you at least agree with me that it should be consistent with toUpperCase -- all strings should have a single canonical toUpperCase, a single canonical toLowerCase, both should define equivalence classes on the mixed-case input strings, these should be the SAME equivalence class, and equalsIgnoreCase should implement and embody the corresponding equivalence relation. ... The version that doesn't shouldn't surprise English speakers; the version that does shouldn't surprise anyone familiar with its locale-specific behavior for the locale actually used. ...
    (comp.lang.java.programmer)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)