Re: CLisp case sensitivity
From: Pascal Bourguignon (spam_at_mouse-potato.com)
Date: 12/16/04
- Next message: David R. Sky: "Re: (; comment suggestion)"
- Previous message: Adam Warner: "Re: CLisp case sensitivity"
- In reply to: Adam Warner: "Re: CLisp case sensitivity"
- Next in thread: Adam Warner: "Re: CLisp case sensitivity"
- Reply: Adam Warner: "Re: CLisp case sensitivity"
- Reply: Adam Warner: "Re: CLisp case sensitivity"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 16 Dec 2004 01:50:22 +0100
Adam Warner <usenet@consulting.net.nz> writes:
> Hi Duane Rettig,
>
> >> I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
> >> size of a character the length of an arbitrary string will be
> >> implementation specific.
> >
> > This claim is false, by definition, since length is specified in terms
> > of a count, and not in terms of widths in some other units of measure.
>
> Here is an arbitrary string encoded in UTF-8: "" [You
> may generate it in CLISP using (string (code-char #x10000))]. It
> consists of a single code point.
No. You have to specify an external format, you cannot generate it jus
with (string (code-char #x10000)). For example in my case, it gives
this error:
Oops, that was with -E utf-16...
Rather try:
(with-open-file (out "test.utf-8" :direction :output
:if-does-not-exist :create
:if-exists :supersede
:external-format charset:utf-8)
(princ (string (code-char #x10000)) out))
> I expect (cl:length "") will NOT return 1 in a 16-bit
> character Allegro yet it will return 1 in CLISP and SBCL. I expect:
Not exactly. In all encoding with >= 8 bits in clisp, this string:
""
as a length of 4 characters.
In encodings with < 8 bits, it contains invalid characters:
$ /usr/local/bin/clisp -ansi -norc -q -E ascii
[1]> ""
*** - invalid byte #xF0 in CHARSET:ASCII conversion
Break 1 [2]>
Now, even when you're using 7-bit encoding as default external format
for files, terminal, etc, a string containing the unicode character of
code #x10000 is always a string of one character:
[3]> (length (string (code-char #x10000)))
1
[4]> (string (code-char #x10000))
*** - Character #\u00010000 cannot be represented in the character set CHARSET:ASCII
Break 1 [5]>
> (let ((s (copy-seq "")))
> (setf (char s 0) #\A)
> s)
You are abusing strings, using them to store _codes_ instead of
characters. This cannot be portable Common Lisp.
All this subject is silly, it's like asking that (length "SGVsbG8K")
returns 5 because (to-base64 "Hello") returns "SGVsbG8K".
-- __Pascal Bourguignon__ http://www.informatimago.com/ Cats meow out of angst "Thumbs! If only we had thumbs! We could break so much!"
- Next message: David R. Sky: "Re: (; comment suggestion)"
- Previous message: Adam Warner: "Re: CLisp case sensitivity"
- In reply to: Adam Warner: "Re: CLisp case sensitivity"
- Next in thread: Adam Warner: "Re: CLisp case sensitivity"
- Reply: Adam Warner: "Re: CLisp case sensitivity"
- Reply: Adam Warner: "Re: CLisp case sensitivity"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|