Re: Any forseeable disasters?

From: Jack Klein (jackklein_at_spamcop.net)
Date: 08/08/04


Date: Sun, 08 Aug 2004 13:57:09 -0500

On Sun, 08 Aug 2004 03:05:08 +0300, Ioannis Vranos
<ivr@guesswh.at.grad.com> wrote in comp.lang.c++:

> JKop wrote:
> > Let's say you want to store a character of the Unicode
> > character system. You want a 32-Bit unsigned integer for
> > this, but wchar_t isn't guaranteed to be 32-Bit.
> >
> > Is there any forseeable disasters to putting this at the
> > beginning of your translation unit:
> >
> > #define wchar_t unsigned long
>
>
>
> Yes. wchar_t is a built in type, so the above looks for trouble. Not to
> mention that it is not needed in the first place since in most systems
> wchar_t is enough sufficient to store Unicode characters. After all, it
> was wide character sets it was created for.

Unicode was originally a 16-bit encoding, and quite a few
implementations provide a 16-bit wchar_t. This is most likely the
reason that Java's type 'char' was defined as 16 bits. But Unicode
has grown to more than 64K defined values, and can no longer fit into
individual 16-bit types without state dependent encoding.

Is there some reason why you suddenly feel the need to add so much
superfluous white space between the end of your text and your
signature line? Why don't you just learn to use a proper signature
delimiter, as specified by the appropriate RFCs? It is not hard at
all, I have been doing it for many years.

A proper signature line consists of the four character sequence:

'-', '-', ' ', '\n'

-- 
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html


Relevant Pages

  • Re: More elegant UTF-8 encoder
    ... Unicode as a sequence of bytes, store it into a sequence of bytes. ... Janusz Brzozowski's notion of derivatives of regular expression, ... which allows me to store the character ranges in their utf8toint encoded ...
    (comp.lang.c)
  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)
  • Re: Unicode Support
    ... if two Unicode strings are the same? ... UTF-16 is basically telling everyone "ok we all got to start ... character, and will likely support *both* endians. ... UTF-8 encodings are also easy to learn to ...
    (alt.lang.asm)
  • Re: Determining if a string is Unicode
    ... there's nothing magic about Unicode. ... where each character occupies 2 bytes, as opposed to a Single-Byte Character ... You could load up a string with rubbish, ... > INF file like so: ...
    (microsoft.public.vb.general.discussion)
  • Re: KANJD212
    ... >>Who decides the factors and what are their criteria, Unicode? ... But once a character is defined/get a codepoint in Unicode it ... standard modifies the codepoint of the kanji to a totally new ... I can use a code like JIS X0208 along with a font ...
    (sci.lang.japan)