Re: Unicode LISP??
From: Ray Dillinger (bear_at_sonic.net)
Date: 09/06/04
- Next message: Morten Reistad: "Re: Xah Lee's Unixism"
- Previous message: Ray Dillinger: "Re: Lisp on windows"
- In reply to: Marcin 'Qrczak' Kowalczyk: "Re: Unicode LISP??"
- Next in thread: Bruno Haible: "Re: Unicode LISP??"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 06 Sep 2004 05:47:50 GMT
Marcin 'Qrczak' Kowalczyk wrote:
> Ray Dillinger <bear@sonic.net> writes:
>>1) Combining codepoints in isolation are members of the
>> character datatype, but, like control characters and
>> characters with buckybits in CLTL2, they aren't
>> string-characters; you can't put them into strings as
>> independent characters.
>
>
> If strings are not isomorphic to sequences of characters (whatever
> exactly "characters" mean), I predict confusion and breakage. In about
> any language which has characters as a dictinct type from strings,
> strings are sequences of characters.
Well, the consideration in CLTL was that the "character"
datatype actually represented two different things. Characters,
and keystrokes. Alt-J is a keystroke. Uppercase J is a
character. It's entirely reasonable to collect characters in
strings; but it's not reasonable to have "strings" of
keystrokes.
So CLTL had this distinction: "characters" as a datatype
included keystrokes, but only true characters (not keystrokes)
were supposed to be string-characters. CLTL2 contained
reference to this, but the committee decision was to allow
buckybits, font bits, and other stuff that could make something
into a non string-character to exist as "implementation defined
attributes" and strike the specification of that behavior from
the standard.
In a grapheme-based system, a combining codepoint by itself,
similarly, is an entity you might have to work with at times,
but it isn't a true character; it doesn't make sense to stick
it into strings by itself without a base character to modify.
Anyway, it was just one of many ideas. I actually think I
prefer the system where the language _primitives_ allow one or
more codepoints per character and enforce absolutely nothing
about which codepoints they may be. All that would come out
in library code for the UNICODE Character Set, and with
different libraries, you could work with the UNI-21 character
set where there is one codepoint per character, or the UNI-16
character set where there is one codepoint per character and
it's restricted to sixteen bits, or the LATIN-1 character set
where there's one codepoint per character and it's restricted
to 8 bits.
Bear
- Next message: Morten Reistad: "Re: Xah Lee's Unixism"
- Previous message: Ray Dillinger: "Re: Lisp on windows"
- In reply to: Marcin 'Qrczak' Kowalczyk: "Re: Unicode LISP??"
- Next in thread: Bruno Haible: "Re: Unicode LISP??"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|