Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Pascal Bourguignon <pjb@xxxxxxxxxxxxxxxxx>
- Date: Tue, 17 May 2005 23:25:36 +0200
Sam Steingold <sds@xxxxxxx> writes:
> These are Unicode characters that have the "digit" Unicode attribute.
The "Unicode digit" attribute!
> CLTS:
>
> digit n. (in a radix) a character that is among the possible
> digits (0 to 9, A to Z, and a to z) and that is defined to have an
> associated numeric weight as a digit in that radix. See Section
> 13.1.4.6 (Digits in a Radix).
>
> <http://www.lisp.org/HyperSpec/Body/sec_13-1-4-6.html> appears to be
> fairly specific: only the standard ASCII characters are potential
> digits. Therefore the Unicode characters with the digit attribute are
> numeric characters but not digits.
And comparing with the specification of DIGIT-CHAR-P:
Description:
Tests whether char is a digit in the specified radix (i.e., with a
weight less than radix). If it is a digit in that radix, its
weight is returned as an integer; otherwise nil is returned.
DIGIT-CHAR-P doesn't cite numeric characters, only pure digits in radix.
If we take section "13.1.4.6 Digits in a Radix" as an axiom then
DIGIT-CHAR-P should not return true for any other character than these
standard characters. It sounds like (DIGIT-CHAR-P #\FULLWIDTH_DIGIT_ONE)
should be false, and implementations should rather add a distinct
UNICODE:DIGIT-CHAR-P function where
(UNICODE:DIGIT-CHAR-P #\FULLWIDTH_DIGIT_ONE) would be 1.
> Suppose we try to keep the Pascal's invariant in all radixes.
>
> #\MATHEMATICAL_BOLD_DIGIT_NINE is 9 in radix 16.
> How about #\MATHEMATICAL_BOLD_CAPITAL_A in radix 16?
> It would see reasonable to expect it to be 10.
Indeed, if we extended section "13.1.4.6 Digits in a Radix".
> (yes, we can use the "character decomposition" to map
> #\MATHEMATICAL_BOLD_CAPITAL_A to #\A, so, it is possible to arrange that
> (let ((*read-base* 16))
> (read-from-string
> (concatenate 'string
> (string #\MATHEMATICAL_BOLD_CAPITAL_A)
> (string #\MATHEMATICAL_BOLD_DIGIT_NINE))))
> returns #xA9).
>
> Now, how about the first letter of the ETHIOPIC alphabet?
> How about alphabets with fewer than 26 letters?
> More than 26 letters? - why not extend the notion of a radix?
>
> Then, how about #\ETHIOPIC_NUMBER_TEN?
> #\ETHIOPIC_NUMBER_THIRTY?
> #\ETHIOPIC_NUMBER_HUNDRED?
> What should
> (read-from-string
> (concatenate 'string
> (string #\ETHIOPIC_NUMBER_HUNDRED)
> (string #\ETHIOPIC_NUMBER_TEN)
> (string #\ETHIOPIC_DIGIT_NINE)))
> return? 119? 100109? (the Ethiopic system is not positional).
I've always found that limiting radix to 36 was too artificial.
Indeed there are notations with higher radices.
> Or even funnier:
> should
>
> (read-from-string
> (concatenate 'string
> (string #\ARABIC-INDIC_DIGIT_ONE)
> (string #\DEVANAGARI_DIGIT_TWO)
> (string #\BENGALI_DIGIT_THREE)
> (string #\GUJARATI_DIGIT_FOUR)
> (string #\TAMIL_DIGIT_FIVE)))
>
> return 12345?
> My points are:
>
> 1. Pascal's invariant is not required by the CLTS.
Ok.
> 2. Requiring Pascal's invariant would produce weird results in
> implementations that use Unicode.
Ok, Unicode (writting systems) is complex.
> 3. The current situation in CLISP allows users to parse Unicode text
> (possibly interpreting numbers &c) by using Unicode attributes
> because DIGIT-CHAR-P returns useful values for Unicode digits.
Then I think that DIGIT-CHAR-P should be NIL for any character that is
not a potential digit as defined in section "13.1.4.6 Digits in a
Radix", and that implementations should provide a _distinct_ function
UNICODE:DIGIT-CHAR-P.
--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
.
- Follow-Ups:
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Sam Steingold
- Re: Invariant with DIGIT-CHAR-P and the reader.
- References:
- Invariant with DIGIT-CHAR-P and the reader.
- From: Pascal Bourguignon
- Re: Invariant with DIGIT-CHAR-P and the reader.
- From: Sam Steingold
- Invariant with DIGIT-CHAR-P and the reader.
- Prev by Date: Re: Breaking numbers
- Next by Date: Re: clisp, cygwin, a2ps, and ext:run-program
- Previous by thread: Re: Invariant with DIGIT-CHAR-P and the reader.
- Next by thread: Re: Invariant with DIGIT-CHAR-P and the reader.
- Index(es):
Relevant Pages
|
Loading