Re: Unicode Support
- From: Chewy509@xxxxxxxxxxxxxxxx
- Date: 20 Apr 2005 17:05:16 -0700
websn...@xxxxxxxxx wrote:
>
> For example, there are letters which can take multiple accents. And
> you can often specify them as base-char + accent1 + accent2. The
> question is, if you flip the order of the accents, does that
represent
> a different character or not? The unicode normalization algorithms
say
> no for some cases, and yes in others.
>
Hi Paul,
That appears to be the hardest bit of all. I've also found that in many
cases (now that I'm looking more closely at the actual encodings and
character maps), that some characters, particularly from the Latin set,
have direct equivalents, eg Latin Small Letter A With Tidle, can be
mapped to either U+00E3 or U+0061 + U+0303, (which is Latin Small
Letter A with Combining Tidle) which makes comparison even harder.
While it's stated that any editor conforming to Unicode 4.x must
produce shortest form (which negates the issue I raised), however
copying from other sources the exact encoding format MUST be preserved.
So that if I have an editor which only conforms to version 1.x, which
produces the long form (v1.x IIRC doesn't state which form to produce
to be conformant), and I copy it over to another editor which produces
short form, the second editor (to be conformant) cannot and should not
convert long form to short form, even though the short form is
considered correct.
So basically: If I copy over ã encoded as U+0061, U+0303 into a text
editor that is conformant to Unicode v4.x, it MUST remain in the long
format (eg 2 characters), even though that particular encoding is not
technically correct, (where U+00E3 is the technically correct
encoding).
That's if I'm reading Chap3 correctly. (If I'm wrong please let me
know).
Now my head hurts...
And I can see the resistance to allowing full unicode support for
labels/identifiers.
I would just like to thank everyone that has replied and voiced a
constructive opinion on this topic.
I will have to admit, supporting unicode is a bit more work than I
first thought! :(
--
Darran (aka Chewy509) brought to you by Google Groups.
.
- Follow-Ups:
- Re: Unicode Support
- From: websnarf
- Re: Unicode Support
- References:
- Unicode Support
- From: Chewy509
- Re: Unicode Support
- From: Chewy509
- Re: Unicode Support
- From: Beth
- Re: Unicode Support
- From: websnarf
- Unicode Support
- Prev by Date: Re: RosAsm is a broken pile of crap
- Next by Date: Re: RosAsm is a broken pile of crap
- Previous by thread: Re: Unicode Support
- Next by thread: Re: Unicode Support
- Index(es):
Relevant Pages
|