Re: Need help on string manipulation
- From: Ben Bacarisse <ben.usenet@xxxxxxxxx>
- Date: Wed, 29 Mar 2006 13:03:03 +0100
On Mon, 27 Mar 2006 22:29:09 -0800, WaterWalk wrote:
Characters represented by wchar_t must use one wchar_t per character,
unlike characters using char, which may use a multibyte encoding. The
actual size and encoding of wchar_t is undefined, and e.g. Dragonfly
BSD uses different encodings of wchar_t depending on the encoding of
char strings. If Windows uses a 16-bit wchar_t, you will be unable to
use some newer Unicode characters, if this is a problem for you, then
avoid wchar_t. You will not have this problem under Linux, since glibc
uses the UCS4, which is 31-bit.
Yes, This is my problem. If any unicode char can be encoded in a single
wchar_t, then life will be much easier. *BUT*, on windows, I can't
simply use wchar_t which is only 16-bit to represent all unicode
characters. I hear that MS WORD uses 2 wchar_t chars to hold those
"extented characters". Then, if one char in a string needs be changed,
the handy array index operation can't be used. What's more, the whole
string may need change. This is really annoying. Any ideas?
For your information, the most common encoding in which multiple 16-bit
objects are used for some Unicode code points is called UTF16. If you
want to use glibc's indexable UCS4 encoding, you can use the GNU C tool
chain on Windows. If not, you may get better answers about this in an MS
Windows programming group.
--
Ben.
.
- References:
- Need help on string manipulation
- From: WaterWalk
- Re: Need help on string manipulation
- From: liljencrantz
- Re: Need help on string manipulation
- From: WaterWalk
- Need help on string manipulation
- Prev by Date: Re: c99 array size Q
- Next by Date: Re: c99 array size Q
- Previous by thread: Re: Need help on string manipulation
- Next by thread: need to convert a char to an hexadecmial value
- Index(es):
Relevant Pages
|