Re: Reading "normal" text files with Wide_Text_IO in GNAT



Adam Beneschan escribió:
Björn Persson wrote:
...
I'd still like to know where UCS-1 is defined, and by whom.
http://www.iana.org/assignments/character-sets lists ISO-10646-UCS-2,
ISO-10646-UCS-4 and ISO-10646-UCS-Basic, but no UCS-1.
http://www.unicode.org/glossary/#U also has entries for UCS-2 and UCS-4,
but no UCS-1.
...
UCS-2 and UCS-4 are representations in which if an integer N maps to a
character, then that character is represented simply by a 2- or 4-byte
binary representation of N (byte ordering is an issue, though). So it
would seem logical that UCS-1 would simply refer to a 1-byte binary
representation of a number. That's how it seemed to me, and I did find
other references to this term, so I figured it was the correct term.
But maybe it isn't official.

Well, it seems that there are no official names for simple, direct encodings (no tied to a given character set). In fact UCS-2 and UCS-4 are specific names for Unicode stuff (UCS means Universal Character Set).

Character encoding concepts are precisely defined in:

http://en.wikipedia.org/wiki/Character_encoding

As you can see, the encoding issue is composed of two separated ideas: the CEF (character encodng form) and the CES (character encoding scheme). Some of the latest ones have explicit names. But the direct CEFs are so simple that they don't need explicit names (just the size of the code value).

If we take UCS-2 and UCS-4 out of the Unicode world and use them as general names for direct CEFs with 16-bit and 32-bit code values, then UCS-1 becomes the natural name for the direct CEF with 8-bit code values. Let it be official or not.

Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
.



Relevant Pages

  • Re: About HeapCreate()
    ... I mainly want to represent color for each character. ... and you can expect performance degradation by orders of magnitude (like 3 ... Another representation is massive chunks of data with the ... HeapAlloc...etc) to allocate cells. ...
    (microsoft.public.vc.mfc)
  • Re: diferences between 22 and python 23
    ... string objects have the same byte ... >representation that they originally had in the source code. ... Then they must have encoding info attached? ... behind the concrete character representations there are abstract entities ...
    (comp.lang.python)
  • Re: SIMPLE NUMBER COMPARISON
    ... | An implementation-dependent representation of the function is returned. ... string, but the string itself is not defined but only loosely indicated; ... character would appear at the beginning of the string. ...
    (comp.lang.javascript)
  • Re: convert char to byte representation
    ... >>>ordgives you decimal representation of a character. ... > also show the octal and hex values as well). ... These are all strings, and they all represent the same ...
    (comp.lang.python)
  • Re: Reading "normal" text files with Wide_Text_IO in GNAT
    ... I've never heard of UCS-1. ... Character encoding - mapping of a sequence of codepoints to a sequence of bytes ... the terms 'character set' and 'character encoding' are used as synonyms in a lot of places. ...
    (comp.lang.ada)