Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Sam Steingold <sds@xxxxxxx>
- Date: Tue, 17 May 2005 13:34:04 -0400
> * josephoswaldgg@xxxxxxxxxxx <wbfrcubfjnyq@xxxxxxxxx> [2005-05-17
> 09:56:14 -0700]:
>
>> what should
>>
>> (read-from-string
>> (concatenate 'string
>> (string #\ARABIC-INDIC_DIGIT_ONE)
>> (string #\DEVANAGARI_DIGIT_TWO)
>> (string #\BENGALI_DIGIT_THREE)
>> (string #\GUJARATI_DIGIT_FOUR)
>> (string #\TAMIL_DIGIT_FIVE)))
>>
>> return?
>> Do you seriously expect such string to mean 12345?
>>
>
> Why not? How else would the producer of such a string mean it to be
> interpreted? Is it not just as unreasonable for the producer to
> deliberately intend such a Lisp symbol? You've deliberately chosen an
> edge case, so who would reasonably *rely* on behavior either way?
CL reader parses as numbers things that "look like a number".
no one will look at the string above and say "yeah, that's a number".
> "What is a word" and "What is a number" is an application- or
> domain-specific question, which cannot be answered in a language spec
> or by an implementation.
this is precisely why the CL reader should _not_ interpret the above
string as a number: because CL reader operates in the CL domain where
the above is not a number as per the CL syntax.
OTOH, (DIGIT-CHAR-P #\DEVANAGARI_DIGIT_TWO) returning 2 is useful
because this is a domain-neutral issue of the nature of the Unicode
character in question.
> Anyway, in the original context, once digit-char-p starts declaring
> things "numeric" there is always a danger such characters will get
> treated in ways that a naive Lisp program might be trying to mimic
> another program, and that other program may use the Lisp reader or
> something similar.
Lisp reader is for Lisp data (including Lisp code).
[yes, it is extensible, but it is extensible to incorporate "Lisp-like"
data, not "every natural syntax you can imagine"; it is relatively easy
to make READ parse XML (CLOCC/CLLIB/xml.lisp), but not C]
There is no way to tell the CL reader to
print 2 as (string #\DEVANAGARI_DIGIT_TWO),
thus there is no reason to read 2 from (string #\DEVANAGARI_DIGIT_TWO).
I hope we all agree on this.
> Instead of requiring every user of digit-char-p to sterilize his data,
what do you mean?
if your data contains Unicode characters, you should know about Unicode.
In Unicode, #\DEVANAGARI_DIGIT_TWO is a digit, and its weight is 2.
[it's not like CLISP is searching for substrings "TWO" in character
names :-)]
This is the same level statement as "in CL, (CAR NIL) returns NIL".
If you do not like what the Unicode international standard says, don't
use Unicode, use ASCII (yes, you can build CLISP in ASCII mode).
If you do not like (CAR NIL) ==> NIL, don't use CL, use Scheme.
--
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.iris.org.il> <http://www.dhimmi.com/> <http://www.camera.org>
<http://www.memri.org/> <http://www.palestinefacts.org/>
Those who value Life above Freedom are destined to lose both.
.
- References:
- Invariant with DIGIT-CHAR-P and the reader.
- From: Pascal Bourguignon
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Sam Steingold
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: josephoswaldgg@xxxxxxxxxxx
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Sam Steingold
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: josephoswaldgg@xxxxxxxxxxx
- Invariant with DIGIT-CHAR-P and the reader.
- Prev by Date: Re: Invariant with DIGIT-CHAR-P and the reader.
- Next by Date: Re: package question
- Previous by thread: Re: Invariant with DIGIT-CHAR-P and the reader.
- Next by thread: Re: Invariant with DIGIT-CHAR-P and the reader.
- Index(es):
Relevant Pages
|