Re: PEP 3131: Supporting Non-ASCII Identifiers



Stefan Behnel <stefan.behnel-n05pAM@xxxxxx> wrote:

Just to confirm that: IronPython does accept non-ascii identifiers.
From "Differences between IronPython and CPython":

IronPython will compile files whose identifiers use non-ASCII
characters if the file has an encoding comment such as "# -*-
coding: utf-8 -*-". CPython will not compile such a file in any
case.

Sounds like CPython would better follow IronPython here.

I cannot find any documentation which says exactly which non-ASCII
characters IronPython will accept.
I would guess that it probably follows C# in general, but it doesn't
follow C# identifier syntax exactly (in particular the leading @ to
quote keywords is not supported).

The C# identifier syntax from http://msdn2.microsoft.com/en-us/library/aa664670(VS.71).aspx
I think it differs from the PEP only in also allowing the Cf class of characters:

identifier:
available-identifier
@ identifier-or-keyword
available-identifier:
An identifier-or-keyword that is not a keyword
identifier-or-keyword:
identifier-start-character identifier-part-charactersopt
identifier-start-character:
letter-character
_ (the underscore character U+005F)
identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character
identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character
letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc
decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd
connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc
formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf

For information on the Unicode character classes mentioned above, see
The Unicode Standard, Version 3.0, section 4.5.
.



Relevant Pages

  • Re: displaying unicode x2258
    ... the unicode character instead as, say, a "dash" command. ... font that contains the character in question (like DejaVu Sans in your ...
    (comp.text.tex)
  • Re: What is better encoding method?
    ... the Unicode character encoding, version 2.1 or later, using the UTF-16 ... though they were performing normalisation of text, ... ECMAScript source text can contain any of the Unicode characters. ...
    (comp.lang.javascript)
  • Re: Perl opting for double-byte chars?
    ... If by "a Unicode character" you mean one whose code value is greater ... incur some processing overhead due to the extra work of Perl handling ... because Perl takes care of it for you (if you're ...
    (comp.lang.perl.misc)
  • Re: Questions about MSDN for some DDK functions
    ... Even if the uppercase version of the specified ... > Unicode character requires two Unicode characters to express, ... > RtlUpcaseUnicodeChar returns it in one WCHAR. ...
    (microsoft.public.development.device.drivers)
  • Re: Testing identifier names for validity
    ... >>string represents a valid identifier name for the C# language. ... - Trim all whitespace from the beginning and end of the string. ... - If the first character is NOT an underscore OR NOT a character OR NOT ... the identifier is NOT valid. ...
    (microsoft.public.dotnet.languages.csharp)

Loading