Re: Mandis Quotes (aka retiring """ and ''')

From: Bengt Richter (bokr_at_oz.net)
Date: 10/05/04

  • Next message: Maboroshi: "simple Tkinter Question"
    Date: 5 Oct 2004 00:59:01 GMT
    
    

    On 4 Oct 2004 07:45:54 -0700, nelson@crynwr.com (Russell Nelson) wrote:

    >Jef Raskin (namedropping) has pointed me at a neat scheme for quoting
    >arbitrary textual matter called "Mandis quotes". Since google is
    >ignorant of the phrase, I presume that Jef made it up. It is
    >disgustingly simple, and very Pythonesque. Here's how it works: If
    >you have a string that doesn't have any single quotes in it, you
    >surround the string by a pair of doubled single quotes. ''Like
    >this''. No backslash interpolation. If you want a character in
    >there, you put it in there (yes, I know, stand down your armies).
    >Clearly, then, any character except a single quote can go into one of
    >these strings. If you need to put a single quote in, then you put
    >an arbitrary string in-between the single quotes which does NOT
    >appear in the string. For example, "Bill's house" becomes
    >'x'Bill's house'x'.
    >
    >More formally, a mandis quote is a pair of tokens surrounding a
    >completely arbitrary sequence of bytes. These tokens are comprised of
    >a possibly null sequence of characters preceded by and followed by a
    >single quote.

    I once started a thread with the same (quoting arbitrary text) goal, but
    I made it a special case of Python string syntax, using a q or Q prefix:

       q'x'Bill's housex

    I thought about re-quoting the 'x' at the tail, but thought more typical usage
    would use a special character for single-character delimiters, e.g.,
       q'|'Bill's house|

    See

    http://groups.google.com/groups?group=comp.lang.python.*&selm=a5srm2%24254%240%40216.39.172.122&rnum=2

    And click on view complete thread to see all 36 posts ;-)

    >
    >To save time, here's why this pre-PEP proposal sucks in decreasing
    >order of severity:
    >
    >o Python source is typically represented, not as an arbitrary string
    > of ASCII or Unicode characters, but instead as a sequence of lines
    > separated by the native line terminator (e.g. CRLF, LF, or CR).
    See Q'... in the above cited thread.

    >
    >o Editors are not all up to the task of inserting arbitrary
    > characters into strings (although they SHOULD).
    >
    >o Email cannot withstand arbitrary strings of characters (although
    > quoted-printable suffices).
    >
    >o Some distinct Unicode characters are represented using the same
    > glyph, so that information is lost when text gets printed (but
    > that's more of a Unicode stupidism.)
    >
    >Obviously, the justification for it is that it eliminates ", ', r",
    >r', """, and ''' from the syntax, replacing them by a single 'x' that
    >suffices for everything. Makes the code easier to read (only one
    >visual element), easier to parse, and easier to write, because you
    >don't need to decide which literal method to use.

    IMO a special use case does not justify complicating ordinary usage,
    but can be justified as a special syntax variant if it stays out of the way
    and provides otherwise unavailable capability.

    As others have pointed out, you couldn't just switch to Mandis Quotes as
    a complete replacement, since it would break existing programs. But you
    could prefix e.g. and 'm' for a special syntax a lot like mine ;-)

        m'x'Bill's House'x'

    Quoting "arbitrary" text also involves the issue of encoding, which is something
    I hadn't thought through when I proposed my syntax. E.g., what happens when you
    paste arbitrary text of possibly different encoding between some delimiters?

    Do you depend on the editor's (if you are using an editor, not programmatically
    concatenating text from various sources) ability to call for encoding transformations
    from clipboard content to its current encoding? Does that lose information if the
    current encoding is not unicode? It's a long discussion, involving what byte sequences
    really mean in the various representations involved (in source files, memory, screen
    presentations, etc.), and which are transient escaped byte representations and which
    are abstract text entities. Another time ... ;-)

    Regards,
    Bengt Richter


  • Next message: Maboroshi: "simple Tkinter Question"

    Relevant Pages

    • Re: Mandis Quotes (aka retiring """ and )
      ... > surround the string by a pair of doubled single quotes. ... > an arbitrary string in-between the single quotes which does NOT ... > of ASCII or Unicode characters, but instead as a sequence of lines ...
      (comp.lang.python)
    • Re: Send string to IP address
      ... "Plain hex" implies something formatted as text, but doesn't answer the question of encoding. ... There's no "just" as far as "an ASCII string" is concerned. ... Characters are not bytes and bytes are not characters. ... Normally you'd create the Writer once at the same time as you create the underlying stream, rather than every time you write some text, obviously. ...
      (comp.lang.java.programmer)
    • Re: Byte Array to String
      ... retrieved text will mismatch the original characters. ... I think VBA may use the default system locale to ... encoding the characters. ... Dim strFileData as String ...
      (microsoft.public.dotnet.framework.aspnet)
    • Re: Character semantics for filenames (was: win32 reading wide filenames (unicode))
      ... Now file name is stored in utf8 format. ... it doesn't make any difference whether the string is internally ... DO WITH CHARACTERS ABOVE "\xFF". ... encoding to perl strings by readdir and from perl strings to the OS ...
      (comp.lang.perl.misc)
    • urwid with multi-byte encoded and bidirectional text?
      ... I would like to support whatever encoding the user likes. ... *new* line translation format would have to support characters that are ... N bytes in the string and M columns wide when displayed, ...
      (comp.lang.python)