Re: Need help on string manipulation



On Mon, 27 Mar 2006 22:29:09 -0800, WaterWalk wrote:

Characters represented by wchar_t must use one wchar_t per character,
unlike characters using char, which may use a multibyte encoding. The
actual size and encoding of wchar_t is undefined, and e.g. Dragonfly
BSD uses different encodings of wchar_t depending on the encoding of
char strings. If Windows uses a 16-bit wchar_t, you will be unable to
use some newer Unicode characters, if this is a problem for you, then
avoid wchar_t. You will not have this problem under Linux, since glibc
uses the UCS4, which is 31-bit.

Yes, This is my problem. If any unicode char can be encoded in a single
wchar_t, then life will be much easier. *BUT*, on windows, I can't
simply use wchar_t which is only 16-bit to represent all unicode
characters. I hear that MS WORD uses 2 wchar_t chars to hold those
"extented characters". Then, if one char in a string needs be changed,
the handy array index operation can't be used. What's more, the whole
string may need change. This is really annoying. Any ideas?

For your information, the most common encoding in which multiple 16-bit
objects are used for some Unicode code points is called UTF16. If you
want to use glibc's indexable UCS4 encoding, you can use the GNU C tool
chain on Windows. If not, you may get better answers about this in an MS
Windows programming group.

--
Ben.
.



Relevant Pages

  • Re: heeeeeeeeeeeeeeeellllllllllllllppppppppppppppppppppp
    ... This means that if you develop the bad habit of using char * (left over ... It usually takes me five minutes to create a Unicode version of any of my apps, ... BOOL and bool are different data types. ... can be up to MAX_PATH characters). ...
    (microsoft.public.vc.mfc)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)
  • Re: DB2 UTF-8 ODBC double conversion
    ... Unicode considers the various UTFs flavors completely equivalent. ... Just various encoding forms for the same thing. ... they can't use your database to represent as many characters as ... are required in order to support the GB-18030 Chinese National standard. ...
    (microsoft.public.vc.mfc)
  • Re: utf8 and ftplib
    ... It opens a new local file using utf8 encoding and then reads from a file ... characters from the source file (e.g. foreign characters, ... Is there any way that I can correctly retrieve a utf8 encoded file from an FTP server? ... to be decoded to unicode on being read later. ...
    (comp.lang.python)
  • Re: TCHAR string?
    ... According to Microsoft's documentation the 'A' functions are "ANSI" ... although Unicode is not itself an ISO standard; ... just as much an ISO encoding as any of the ISO encodings ... Windows) *was* to be able to represent any of the characters of the ...
    (microsoft.public.vc.mfc)