Re: Invariant with DIGIT-CHAR-P and the reader.



Sam Steingold <sds@xxxxxxx> writes:

> These are Unicode characters that have the "digit" Unicode attribute.

The "Unicode digit" attribute!


> CLTS:
>
> digit n. (in a radix) a character that is among the possible
> digits (0 to 9, A to Z, and a to z) and that is defined to have an
> associated numeric weight as a digit in that radix. See Section
> 13.1.4.6 (Digits in a Radix).
>
> <http://www.lisp.org/HyperSpec/Body/sec_13-1-4-6.html> appears to be
> fairly specific: only the standard ASCII characters are potential
> digits. Therefore the Unicode characters with the digit attribute are
> numeric characters but not digits.

And comparing with the specification of DIGIT-CHAR-P:

Description:

Tests whether char is a digit in the specified radix (i.e., with a
weight less than radix). If it is a digit in that radix, its
weight is returned as an integer; otherwise nil is returned.

DIGIT-CHAR-P doesn't cite numeric characters, only pure digits in radix.

If we take section "13.1.4.6 Digits in a Radix" as an axiom then
DIGIT-CHAR-P should not return true for any other character than these
standard characters. It sounds like (DIGIT-CHAR-P #\FULLWIDTH_DIGIT_ONE)
should be false, and implementations should rather add a distinct
UNICODE:DIGIT-CHAR-P function where
(UNICODE:DIGIT-CHAR-P #\FULLWIDTH_DIGIT_ONE) would be 1.


> Suppose we try to keep the Pascal's invariant in all radixes.
>
> #\MATHEMATICAL_BOLD_DIGIT_NINE is 9 in radix 16.
> How about #\MATHEMATICAL_BOLD_CAPITAL_A in radix 16?
> It would see reasonable to expect it to be 10.

Indeed, if we extended section "13.1.4.6 Digits in a Radix".

> (yes, we can use the "character decomposition" to map
> #\MATHEMATICAL_BOLD_CAPITAL_A to #\A, so, it is possible to arrange that
> (let ((*read-base* 16))
> (read-from-string
> (concatenate 'string
> (string #\MATHEMATICAL_BOLD_CAPITAL_A)
> (string #\MATHEMATICAL_BOLD_DIGIT_NINE))))
> returns #xA9).
>
> Now, how about the first letter of the ETHIOPIC alphabet?
> How about alphabets with fewer than 26 letters?
> More than 26 letters? - why not extend the notion of a radix?
>
> Then, how about #\ETHIOPIC_NUMBER_TEN?
> #\ETHIOPIC_NUMBER_THIRTY?
> #\ETHIOPIC_NUMBER_HUNDRED?
> What should
> (read-from-string
> (concatenate 'string
> (string #\ETHIOPIC_NUMBER_HUNDRED)
> (string #\ETHIOPIC_NUMBER_TEN)
> (string #\ETHIOPIC_DIGIT_NINE)))
> return? 119? 100109? (the Ethiopic system is not positional).

I've always found that limiting radix to 36 was too artificial.
Indeed there are notations with higher radices.


> Or even funnier:
> should
>
> (read-from-string
> (concatenate 'string
> (string #\ARABIC-INDIC_DIGIT_ONE)
> (string #\DEVANAGARI_DIGIT_TWO)
> (string #\BENGALI_DIGIT_THREE)
> (string #\GUJARATI_DIGIT_FOUR)
> (string #\TAMIL_DIGIT_FIVE)))
>
> return 12345?

> My points are:
>
> 1. Pascal's invariant is not required by the CLTS.

Ok.

> 2. Requiring Pascal's invariant would produce weird results in
> implementations that use Unicode.

Ok, Unicode (writting systems) is complex.


> 3. The current situation in CLISP allows users to parse Unicode text
> (possibly interpreting numbers &c) by using Unicode attributes
> because DIGIT-CHAR-P returns useful values for Unicode digits.

Then I think that DIGIT-CHAR-P should be NIL for any character that is
not a potential digit as defined in section "13.1.4.6 Digits in a
Radix", and that implementations should provide a _distinct_ function
UNICODE:DIGIT-CHAR-P.


--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
.



Relevant Pages

  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > All but clisp keep this invariant. ... > But then, only clisp digit-char-p returns true for non ASCII digits, ... These are Unicode characters that have the "digit" Unicode attribute. ...
    (comp.lang.lisp)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... >> (but not non-standard digits for READ). ... > The spec appears to be inconsistent on this. ... The list includes only standard characters, ... overall issue was that with Unicode incomplete at the time of the ...
    (comp.lang.lisp)
  • Re: Paper & pencil password algorithm
    ... currently I am using a 10x10 Polybius square. ... How would you convert base 11 digits to base 10 digits with uniform ... characters to find on the keyboard. ... I'm undecided on whether it should be 2 or 3 foldsums. ...
    (sci.crypt)
  • Re: Question for the math wizards...
    ... string of characters that isn't too long, ... bits per character with base-32 encoding, then we are limited to shipping ... to know if it was possible given m to map m via a function Fto an m' ... In real world terms, say n is 100 digits, m is 50 digits, and I want to ...
    (sci.crypt)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > I don't think that CL should deviate from Unicode wrt characters. ... claiming that standard Lisp's digit-char-p was inconsistent with the ... combining Unicode digits from multiple ...
    (comp.lang.lisp)

Loading