Re: Need help on string manipulation
- From: "WaterWalk" <toolmaster@xxxxxxx>
- Date: 27 Mar 2006 22:29:09 -0800
liljencrantz@xxxxxxxxx 写道:
WaterWalk skrev:
Hello, I'm currently learning string manipulation. I'm curious about
what is the favored way for string manipulation in C, expecially when
strings contain non-ASCII characters. For example, if substrings need
be replaced, or one character needs be changed, what shall I do? Is it
better to convert strings to UCS-32 before manipulation?
But on Windows, wchar_t is 16 bits which isn't enough for characters
which can't be simply encoded using 16 bits.
On Linux, I hear wchar_t is 32 bit. Maybe on Linux, strings can be
simply converted to wchar_t and then handle them without worrying? I'm
not sure.
Characters represented by wchar_t must use one wchar_t per character,
unlike characters using char, which may use a multibyte encoding. The
actual size and encoding of wchar_t is undefined, and e.g. Dragonfly
BSD uses different encodings of wchar_t depending on the encoding of
char strings. If Windows uses a 16-bit wchar_t, you will be unable to
use some newer Unicode characters, if this is a problem for you, then
avoid wchar_t. You will not have this problem under Linux, since glibc
uses the UCS4, which is 31-bit.
Yes, This is my problem. If any unicode char can be encoded in a single
wchar_t, then life will be much easier. *BUT*, on windows, I can't
simply use wchar_t which is only 16-bit to represent all unicode
characters. I hear that MS WORD uses 2 wchar_t chars to hold those
"extented characters". Then, if one char in a string needs be changed,
the handy array index operation can't be used. What's more, the whole
string may need change. This is really annoying. Any ideas?
Things like being able to use [] to access a character with a specificFor some reason, I can't visit this site. Feel sad.
index, being able to use int:s to iterate over a string and being able
to examine a specific character without worrying about if it's a
multibyte character makes life _much_ easier.
What is a "good" way to handle all this mess? Are there any good
examples? I'll be very thankful for your help.
I have written a non-trivial program called fish (It's a commandline
shell for Unix, kind of like bash or zsh) that uses wide character
strings internally, you can download it from
http://roo.no-ip.org/fish/.
.
- Follow-Ups:
- Re: Need help on string manipulation
- From: Ben Bacarisse
- Re: Need help on string manipulation
- References:
- Need help on string manipulation
- From: WaterWalk
- Re: Need help on string manipulation
- From: liljencrantz
- Need help on string manipulation
- Prev by Date: design question
- Next by Date: Re: Why no segmentation fault
- Previous by thread: Re: Need help on string manipulation
- Next by thread: Re: Need help on string manipulation
- Index(es):
Relevant Pages
|