[Python-3000] PEP 3131 accepted

Josiah Carlson jcarlson at uci.edu
Wed May 23 20:21:53 CEST 2007


Removing those words that some found offensive, perhaps I will get a
reponse to the point of my post: "your tools aren't very good" and
"Emacs does it right" are not valid responses to the concerns brought up
regarding unicode.

"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> Josiah Carlson writes:
> 
>  > From identical character glyph issues (which have been discussed
>  > off and on for at least a year),
> 
> In my experience, this is not a show-stopping problem.

I never claimed that this, by itself, was a showstopper.

And my post should not be seen as a "these are all the problems that I
have seen with PEP 3131".  Those are merely the issues that have been
discussed over and over, for which I (and seemingly others) are still
concerned with, regardless of the hundreds of posts here and in
comp.lang.python seeking to convince us that "they are not a problem".

> Emacs/MULE has
> had it for 20 years because of the (horrible) design decision to
> attach charset information to each character in the representation of
> text.  Thus, MULE distinguishes between NO-BREAK SPACE and NO-BREAK
> SPACE (the same!) depending on whether the containing text "is" ISO
> 8859-15 or "is" ISO 8859-1.  (Semantically this is different from the
> identical glyph, different character problem, since according to ISO
> 8859 those characters are identical.  However, as a practical matter,
> the problem of detecting and dealing with the situation is the same as
> in MULE the character codes are different.)
> 
> How does Emacs deal with this?  Simple.  We provide facilities to
> identify identical characters (not relevant to PEP 3131, probably), to
> highlight suspicious characters (proposed, not actually implemented
> AFAIK, since identification does what almost all users want), and to
> provide information on characters in the editing buffer.  The
> remaining problems with coding confusion are due to deficient
> implementation (mea maxima culpa).
> 
> I consider this to be an editor/presentation problem, not a language
> definition issue.

This particular excuse angers me the most.  "If you can't differentiate,
then your font or editor is garbage."  Thank you for passing judgement
on my choice of font or editor, but Ka-Ping already stated why this
argument isn't valid: there does not currently exist a font where one
*can* differentiate all the glyphs, and further, even if one could
visually differentiate similar glyphs, *remembering* the 64,000+ glyphs
that are available in just the primary unicode plane to differentiate
them, is a herculean task.

Never mind the fact that people use dozens, perhaps hundreds of
different editors to write and maintain Python code, that the 'Emacs
works' argument is poor at best.  Heck, Thomas Bushnell made the same
argument when I spoke with him 2 1/2 years ago (though he also included
Vim as an alternative to Emacs); it smelled like garbage then, and it
smells like garbage now.


> Note that Ka-Ping's worry about the infinite extensibility of Unicode
> relative to any human being's capacity is technically not a problem.
> You simply have your editor substitute machine-generated identifiers
> for each identifier that contains characters outside of the user's
> preferred set (eg, using hex codes to restrict to ASCII), then review
> the code.  When you discover what an identifier's semantics are, you
> give it a mnemonic name according to the local style guide.
> Expensive, yes.  But cost is a management problem, not the kind of
> conceptual problem Ka-Ping claims is presented by multilingual
> identifiers.  Python is still, in this sense, a finitely generated
> language.

That's a poor argument, and you know it.  "Just use hex escapes"? Modulo
unicode comments and strings, all Python programs are easily read in
default fonts available on every platform on the planet today.  But with
3131, people accepting 3rd party code need to break 15+ years of "what
you see is what is actually there" by verifying the character content of
every identifier?  That's a silly and unnecessary workload addition for
anyone who wants to accept patches from 3rd parties, and relies on the
same "your tools are poor" argument to invalidate concerns over unicode
glyph similarity.

Speaking of which, do you know of a fixed-width font that is able to
allow for the visual distinction of all unicode glyphs in the primary
plane, or even the portion that Martin is proposing we support?  This
also "is not a show-stopper", but it certainly reduces audience
satisfaction by a large margin.


>  > to editing issues (being that I write and maintain a Python editor)
> 
> Multilingual editing (except for non-LTR scripts) is pretty much a
> solved problem, in theory, although adding it to any given
> implementation can be painful.  However, since there are many
> programmer's editors that can handle multilingual text already, that
> is not a strong argument against PEP 3131.

Another "your tools aren't very good" argument.  While my editor has
been able to handle unicode content for a couple years now (supporting
all encodings available to Python), every editor that wants to properly
support the adding of unicode text in any locale will necessitate the
creation of charmap-like interfaces in basically every editor.

But really, I'm glad that Emacs works for you and has solved this
problem for you.  I honestly tried to use it 4 years ago, spent a couple
weeks with it.  But it didn't work for me, and I've spent the last 4
years writing an editor because it and the other 35 editors I tried at
the time didn't work for me (as have the dozens of others for the exact
same reason). But of course, our tools suck, and because we can't use
Emacs, we are already placed in a 2nd tier ghettoized part of the Python
community of "people with tools that aren't Emacs".

Thank you for hitting home that unless people use Emacs, their tools
arent sufficient for Python development. I still don't believe that my
concerns have been addressed. And I certainly don't believe that those
Ka-Ping brought up (which are better than mine) have been addressed.
 But hey, my tools aren't Emacs, so obviusly my concerns regarding using
my tools to edit Python in the future don't matter.  Thank you for the
vote of confidence.


 - Josiah



More information about the Python-3000 mailing list