Mandis Quotes (aka retiring """ and ''')

Bengt Richter bokr at oz.net
Mon Oct 4 20:59:01 EDT 2004


On 4 Oct 2004 07:45:54 -0700, nelson at crynwr.com (Russell Nelson) wrote:

>Jef Raskin (namedropping) has pointed me at a neat scheme for quoting
>arbitrary textual matter called "Mandis quotes".  Since google is
>ignorant of the phrase, I presume that Jef made it up.  It is
>disgustingly simple, and very Pythonesque.  Here's how it works: If
>you have a string that doesn't have any single quotes in it, you
>surround the string by a pair of doubled single quotes.  ''Like
>this''.  No backslash interpolation.  If you want a character in
>there, you put it in there (yes, I know, stand down your armies).
>Clearly, then, any character except a single quote can go into one of
>these strings.  If you need to put a single quote in, then you put
>an arbitrary string in-between the single quotes which does NOT
>appear in the string.  For example, "Bill's house" becomes
>'x'Bill's house'x'.
>
>More formally, a mandis quote is a pair of tokens surrounding a
>completely arbitrary sequence of bytes.  These tokens are comprised of
>a possibly null sequence of characters preceded by and followed by a
>single quote.

I once started a thread with the same (quoting arbitrary text) goal, but
I made it a special case of Python string syntax, using a q or Q prefix:

   q'x'Bill's housex

I thought about re-quoting the 'x' at the tail, but thought more typical usage
would use a special character for single-character delimiters, e.g.,
   q'|'Bill's house|

See

http://groups.google.com/groups?group=comp.lang.python.*&selm=a5srm2%24254%240%40216.39.172.122&rnum=2

And click on view complete thread to see all 36 posts ;-)


>
>To save time, here's why this pre-PEP proposal sucks in decreasing
>order of severity:
>
>o Python source is typically represented, not as an arbitrary string
>  of ASCII or Unicode characters, but instead as a sequence of lines
>  separated by the native line terminator (e.g. CRLF, LF, or CR).
See Q'... in the above cited thread.

>
>o Editors are not all up to the task of inserting arbitrary
>  characters into strings (although they SHOULD).
>
>o Email cannot withstand arbitrary strings of characters (although
>  quoted-printable suffices).
>
>o Some distinct Unicode characters are represented using the same
>  glyph, so that information is lost when text gets printed (but
>  that's more of a Unicode stupidism.)
>
>Obviously, the justification for it is that it eliminates ", ', r",
>r', """, and ''' from the syntax, replacing them by a single 'x' that
>suffices for everything.  Makes the code easier to read (only one
>visual element), easier to parse, and easier to write, because you
>don't need to decide which literal method to use.

IMO a special use case does not justify complicating ordinary usage,
but can be justified as a special syntax variant if it stays out of the way
and provides otherwise unavailable capability.

As others have pointed out, you couldn't just switch to Mandis Quotes as
a complete replacement, since it would break existing programs. But you
could prefix e.g. and 'm' for a special syntax a lot like mine ;-)

    m'x'Bill's House'x'

Quoting "arbitrary" text also involves the issue of encoding, which is something
I hadn't thought through when I proposed my syntax. E.g., what happens when you
paste arbitrary text of possibly different encoding between some delimiters?

Do you depend on the editor's (if you are using an editor, not programmatically
concatenating text from various sources) ability to call for encoding transformations
from clipboard content to its current encoding? Does that lose information if the
current encoding is not unicode? It's a long discussion, involving what byte sequences
really mean in the various representations involved (in source files, memory, screen
presentations, etc.), and which are transient escaped byte representations and which
are abstract text entities. Another time ... ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list