PEP 3131: Supporting Non-ASCII Identifiers

Mon May 14 07:49:41 EDT 2007

Marco Colombo wrote:
> I suggest we keep focused on the main issue here, which is "shoud non-
> ascii identifiers be allowed, given that we already allow non-ascii
> strings literals and comments?"
> 
> Most arguments against this proposal really fall into the category
> "ascii-only source files". If you want to promote code-sharing, then
> you should enfore quite restrictive policies:
> - 7-bit only source files, so that everyone is able to correctly
> display and _print_ them (somehow I feel that printing foreign glyphs
> can be harder than displaying them) ;
> - English-only, readable comments _and_ identifiers (if you think of
> it, it's really the same issue, readability... I know no Coding Style
> that requires good commenting but allows meaningless identifiers).
> 
> Now, why in the first place one should be allowed to violate those
> policies? One reason is freedom. Let me write my code the way I like
> it, and don't force me writing it the way you like it (unless it's
> supposed to be part of _your_ project, then have me follow _your_
> style).
> 
> Another reason is that readability is quite a relative term...
> comments that won't make any sense in a real world program, may be
> appropriate in a 'getting started with' guide example:
> 
> # this is another way to increment variable 'a'
> a += 1
> 
> we know a comment like that is totally useless (and thus harmful) to
> any programmer (makes me think "thanks, but i knew that already"), but
> it's perfectly appropriate if you're introducing that += operator for
> the first time to a newbie.
> 
> You could even say that most string literals are best made English-
> only:
> 
> print "Ciao Mondo!"
> 
> it's better written:
> 
> print _("Hello World!")
> 
> or with any other mean to allow the i18n of the output. The Italian
> version should be implemented with a .po file or whatever.
> 
> Yet, we support non-ascii encodings for source files. That's in order
> to give authors more freedom. And freedom comes at a price, of course,
> as non-ascii string literals, comments and identifiers are all harmful
> to some extents and in some contexts.
> 
> What I fail to see is a context in which it makes sense to allow non-
> ascii literals and non-ascii comments but _not_ non-ascii identifiers.
> Or a context in which it makes sense to rule out non-ascii identifiers
> but not string literals and comments. E.g. would you accept a patch
> with comments you don't understand (or even that you are not able to
> display correctly)? How can you make sure the patch is correct, if you
> can't read and understand the string literals it adds?
> 
> My point being that most public open source projects already have
> plenty of good reasons to enforce an English-only, ascii-only policy
> on source files. I don't think that allowing non-ascii indentifiers at
> language level would hinder thier ability to enforce such a policy
> more than allowing non-ascii comments or literals did.
> 
> OTOH, I won't be able to contribute much to a project that already
> uses, say, Chinese for comments and strings. Even if I manage to
> display the source code correctly here, still I won't understand much
> of it. So I'm not losing much by allowing them to use Chinese for
> identifiers too.
> And whether it was a mistake on their part not to choose an "English
> only, ascii only" policy it's their call, not ours, IMHO.

Very well written.

+1

Stefan