Coddling Emacs (long)

Barry A. Warsaw barry at digicool.com
Mon May 28 12:58:22 EDT 2001


>>>>> "JF" == Jonathan Feinberg <jdf at pobox.com> writes:

    JF> ObPython: I also found this line in a Python program while
    JF> searching for Perl examples.  Do you all have similar emacs
    JF> strategies?

    JF>    """, re.X) # "emacs!

I'll relate my story about why I ultimate ditched Perl forever in
favor of Python.  When Lucid Emacs (now called XEmacs) came out, it
was the first such beast to support syntax highlighting.  I relegated
it to novelty for a while because of the garish default color choices,
but soon came to recognize syntax highlighting (in the form of
font-lock mode) for what it was: speedy extra clues as to the
syntactic correctness of your program.  So I soon became a convert...

...only to be completely frustrated with yet another aspect of Perl
4's incomprehensible syntax: the use of a single tick -- ' -- as a
package separator.   Old Perl actually allowed these characters to be
both balanced as used for strings and unbalanced as used similar to
C++'s :: scoping operator.  Emacs font-lock turds such as you describe
became common place, and it sucked.

Along comes Python with its sane syntax, and among other things, its
consistent use of quoting.  Yippee!  No more Emacs turds.  And Guido
exercised his usual brilliant design sensibilities when he introduced
multiline strings by using triple-quotes (which to the Emacs syntax
parser looks like three adjacent strings -- think about it!).

Well, there's /almost/ no reason for Emacs turds with any modern
Emacsen.  Two things can bite you, one which would be hard to fix and
one which would be easy to fix (speaking as someone who spent a /lot/
of time in the bowels of Emacs's syntax tables many years ago).

First, because a Python multiline string looks like just three
adjacent strings to the Emacs parser, you easily trick it by
exploiting the English language's balanced/unbalanced quoting
inconsistency:

'''This isn't a good thing to do.'''

Note that this example uses SQTQ (single-quote triple-quoted strings)
with an embedded apostrophe.  That confuses Emacs because it looks
like [empty-string, the string "This isn", some randome code, another
empty string, a trailing unbalanced single quote].

Working around this is simple though; use DQTQ (double-quote
triple-quoted strings), such as:

"""This isn't a good thing to do."""

No problem because Emacs's syntax tables require balanced string
quotes to be the same character.  But what if you have this:

"""This isn't a good "thing" to do."""

That's a little uglier, but no problem for Emacs.  The string "thing"
will not be font-locked to look like a string, but everything else
around it is font-locked just fine.  Fortunately, English doesn't also
overload the unbalanced double quote. :)

Teaching Emacs's syntax tables about Python's triple quoted strings
would not be easy, and I certainly have no urge to do it in this
life.  I burned out too many of those cells teaching it about C++'s
dual comment syntaxes.

The second place where you sometimes need Emacs turds, or more
appropriately, XEmacs turds, is when you've inadvertently put an open
parenthesis in column zero inside a triple quoted string:

"""This is a docstring
(a supposedly informational
string that describes a
Python thing.)
This module is useful.
"""

That open paren in line 2 trips a hardcoded optimization, useful for
C-ish languages and Lisp, but utterly useless for Python.  It turns
out to be fairly expensive to search backwards in a buffer for the
beginning of a function definition, so Emacs has a shortcut that says
any open-parenthesis character (and that includes at least `(', and
`{') in column zero, must indicate the start of a function definition.
This shortcut makes no exceptions for open-parens inside a string,
since calculating that could potentially be at least as expensive as
not having the shortcut at all.

This rule obviously makes no sense for Python, but it turns out that
in older Emacsen this rule was hardcoded in the C primitives and
couldn't be turned off or made language mode sensitive.  I believe
that most recent FSF Emacsen do a more sane thing here, but XEmacs
as of 21.1.14 still suffer from this bogosity.  It's worked around by
moving that paren out of column zero (i.e. adding a single space right
before it on the line), or adding a turd comment right after the
string:

"""This is a docstring
(a supposedly informational
string that describes a
Python thing.)
This module is useful.
""" # " Emacs turd

I've never had the time to fix XEmacs to make this shortcut
configurable, but I have asked the XEmacs development team to add such
a thing.  I haven't checked the latest XEmacs 21.4's to know whether
this has been fixed or not.

more-than-you-ever-wanted-to-know-i'm-sure-ly y'rs,
-Barry




More information about the Python-list mailing list