Re: Invariant with DIGIT-CHAR-P and the reader.



Pascal Bourguignon wrote:
> "josephoswaldgg@xxxxxxxxxxx" <josephoswald@xxxxxxxxx> writes:
> > Ah, I see now your position more clearly; I had thought you were
> > claiming that standard Lisp's digit-char-p was inconsistent with
the
> > behavior of the reader in gathering numbers. I withdraw my
suggestion
> > that a "Frankenstein"-number, combining Unicode digits from
multiple
> > scripts, should appear to the Lisp reader as a number. More careful
> > reading of the standard makes me believe the Lisp reader should
read
> > your "Frankenstein"-number as a symbol, but because I think those
> > characters should return NIL for Lisp's digit-char-p.
>
> What about this frankenstein number:
>
> (let ((*read-base* 16)) (read-from-string "DeadFace"))
>
> The current standard accepts higher base digits with two forms
(upcase
> and downcase).
> Why a new CL could not accept more forms for unicode digits?
>

The "DeadFace" (assuming they are the from the 52 "standard" A..Z,
a..z, characters chosen by the implementation, and not from some higher
code page like "U+FF24 fullwidth latin capital letter D" distinct from
the likely standard U+0044 Latin Capital Letter D"), clearly falls
within the standard's definition of "digits in a radix", but there can
only be "D" and "d" not many different "D"-like characters.

I.e., I'm assuming that for character #\D from your example

(digit-char-p #\D 16) ==> 13, as it *should* for the "standard" "D"/"d"
but *not* for the possibly implementation-defined "non-standard D"
characters.

My reading of the standard is that the implementation gets to choose
which 96 characters it is using as members of the "standard" set,
(i.e., could be EBCDIC or other coding, or some wild choice from among
all the Unicode spectrum) and anything else is either mapped
indistingishably to one of those 96, or is a implementation-defined
character which cannot satisfy digit-char-p for *any* radix.

A new CL that accepted more forms as digits would be just
that--"new"--as in "distinct from the current ANSI standard."

>
> > I think we can agree
> >
> > 1) the standard definition of digit-char-p only applies to the
> > "standard" digits.
>
> Good, then in clisp (digit-char-p (character "9")) should return
nil.

On my browser, I can't tell the difference between that and the usual
ASCII "9" , but if you are talking about something like U+FF19, I would
agree. (Unless clisp decides U+FF19 is the "standard nine" and U+0039
is an implementation-defined numeric character, but this would be quite
ASCII-hostile).

>
>
> > 3) even if it were plausible, using non-Latin digits routinely as
> > standard digits is bound to get into trouble
>
> We should distinguish:
>
> - non-Latin digits used in strange systems (Roman-like, Babylonian,
> Hebrew, etc),
>
> - non-Latin digits used with the normal decimal system (Arabic, etc),
>
> - various forms of Latin digits (FULLWIDTH_DIGIT_*,
MATHEMATIC_*_DIGIT_*),
>
> - the standard DIGIT_*.
>
> I'd say that all but the first category should be readable as numbers
> in base 10.

Once we agree on predicates for the various Unicode properties,
decisions about "readability" like this would involve *additional*
extensions to the reader & parse-integer also beyond the current
standard. That's a much more complicated discussion than the Lisp
functions which classify characters, because then we have to worry,
just for starters, about Lisp code that might have these characters in
them, and compatibility with old ANSI-conformant implementations that
have no idea what to make of FULLWIDTH_DIGIT_* in what we wish to be an
integer constant, or in reader macros like #2A((1 2) (3 4)). (I guess
that's the main answer to my "Why not [accept Frankenstein numbers in
the reader]?")

.



Relevant Pages

  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... should appear to the Lisp reader as a number. ... >>> reading of the standard makes me believe the Lisp reader should ... >> Why a new CL could not accept more forms for unicode digits? ... > a..z, characters chosen by the implementation, and not from some higher ...
    (comp.lang.lisp)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > I don't think that CL should deviate from Unicode wrt characters. ... claiming that standard Lisp's digit-char-p was inconsistent with the ... combining Unicode digits from multiple ...
    (comp.lang.lisp)
  • Re: Numeric literals
    ... > if you are permitted to space the digits out in some manner. ... Fortran already uses the underscore ... > assume that non-ASCII characters are not likely to be popular ... > choices (though anyone with a standard compliant news/mail ...
    (comp.lang.fortran)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > claiming that standard Lisp's digit-char-p was inconsistent with the ... should appear to the Lisp reader as a number. ... Why a new CL could not accept more forms for unicode digits? ...
    (comp.lang.lisp)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > I mean the standard just arbitrarily specifies that D and d will be ... the same way it lists the standard characters, ... accept additional characters in as "digits". ...
    (comp.lang.lisp)