Re: If you could add anything you want




"Chris Uppal" <chris.uppal@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:44749f09$0$648$bed64819@xxxxxxxxxxxxxxxxxxxx

The Chinese/Japanese/Korean ideographs are unified, so there are "only" about
70 thousand characters between them (in 4.0, more still in later versions)

[...]

I get the impression that this unification has required rather a lot of
scholarship. And presumably lashings of diplomacy too.

Yes. The Japanese don't write their characters exactly the same way as the Chinese do and vice versa. Some people aren't too happy that the example glyphs are drawn the "wrong" way. To me, that's more of a font issue than a Unicode issue (and in theory, a given codepoint could be rendered in the "Chinese way" if the rendering-software detected that the locale were China, and the "Japanese way" if in the Japan locale, etc.), but others argue that these are distinct characters and should have seperate codepoints all together. See http://en.wikipedia.org/wiki/Han_unification#Controversy


Sort of ... more accurately, I think, is that
Unicode is all about glyphs and providing numeric
codes to correspond to those glyphs, but doesn't
care _in detail_ about the glyph shape, just about
the user universe's understanding that a script
glyph for "a" and an OCR glyph for "a" and a "block
outline font" glyph for "a" and a "Courier New"
glyph for "a" are all somehow instances of the
_same_ (abstract) glyph "a".

I agree. One proof of this is that the standard includes sample concrete
glyphs. If the actual shapes (at some level of abstraction) were not important
(indeed central) then
a) There'd be no point in including pictures in the standard.
b) There'd be no way to express /what/ the standard was standardising.

I think (b) is more important than (a) here. I see the example-glyphs provided in the standard to help facilicate understanding, but are NOT mandating a particular shape for the glyph. For example, the example-glyph for "open parenthesis" is drawn as a curve that can be described as bulging towards the left. However, the standard itself specifies that if the locale is a right-to-left one, then the software should probably render it as a curve bugling towards the right instead. I.e. what you see actually see is NOT nescessarily the example-glyph given in the standard.



> * One interesting (to me anyway, and in the
> context of this discussion) character is U+2062.
[...]
(Aside: is the same character used for functional
application, or is that a second kind of semantically significant
juxtaposition ?)

Function application is U+2061. U+2063 is "invisible seperator", presumably to seperate a sequence of items in a list.

- Oliver

.



Relevant Pages

  • Re: Imager with UTF8
    ... characters. ... FreeSans has Japanese? ... Arial Unicode MS also has over 50,000 glyphs. ...
    (comp.lang.perl.misc)
  • Re: Special Character Binary Representation
    ... By special characters I mean any character that is not on the standard ... Chinese or Japanese characters. ... >> How can I produce special charcters using a standard keyboard? ...
    (sci.lang.japan)
  • Re: intrinsic advantage of Latin alphabet over bopomofo (for Chinese)??
    ... Oliver> Or Japanese. ... Many Japanese know the standard ... Mandarin sounds of characters. ...
    (sci.lang)
  • Javac-compilor error
    ... discipline id.e.programming Java. ... from standard input and writes to standard output, but it is possible to redirect the input ... error occurs while trying to open the file, an exception of type IllegalArgumentException ... then this number of characters, then extra spaces are added to the front of x to bring ...
    (Fedora)
  • Re: Case-sensitivity as option?
    ... Wide characters: Mapped to half-width characters ... I guess taking case insensitivity out of standard would be the right ... simplification (eg. you can use general purpose hash-algorithms ...
    (comp.lang.forth)

Loading