PEP 3131: Supporting Non-ASCII Identifiers

Mon May 14 11:14:09 EDT 2007

On Mon, 14 May 2007 12:17:36 +0200, Stefan Behnel  
<stefan.behnel-n05pAM at web.de> wrote:
> Eric Brunel wrote:
>> On Mon, 14 May 2007 11:00:29 +0200, Stefan Behnel
>>> Any chance there are still kanji-enabled programmes around that were
>>> not hit
>>> by the bomb in this scenario? They might still be able to help you get
>>> the
>>> code "public".
>>
>> Contrarily to what one might think seeing the great achievements of
>> open-source software, people willing to maintain public code and/or make
>> it evolve seem to be quite rare. If you add burdens on such people -
>> such as being able to read and write the language of the original code
>> writer, or forcing them to request a translation or transliteration from
>> someone else -, the chances are that they will become even rarer...
>
> Ok, but then maybe that code just will not become Open Source. There's a
> million reasons code cannot be made Open Source, licensing being one,  
> lack of
> resources being another, bad implementation and lack of documentation  
> being
> important also.
>
> But that won't change by keeping Unicode characters out of source code.

Maybe; maybe not. This is one more reason for a code preventing it from  
becoming open-source. IMHO, there are already plenty of these reasons, and  
I don't think we need a new one...

> Now that we're at it, badly named english identifiers chosen by  
> non-english
> native speakers, for example, are a sure way to keep people from  
> understanding
> the code and thus from being able to contribute resources.

I wish we could have an option forbidding these also ;-) But now, maybe  
some of my own code would no more execute when it's turned on...

> I'm far from saying that all code should start using non-ASCII  
> characters.
> There are *very* good reasons why a lot of projects are well off with  
> ASCII
> and should obey the good advice of sticking to plain ASCII. But those are
> mainly projects that are developed in English and use English  
> documentation,
> so there is not much of a risk to stumble into problems anyway.
>
> I'm only saying that this shouldn't be a language restriction, as there
> definitely *are* projects (I know some for my part) that can benefit  
> from the
> clarity of native language identifiers (just like English speaking  
> projects
> benefit from the English language). And yes, this includes spelling  
> native
> language identifiers in the native way to make them easy to read and  
> fast to
> grasp for those who maintain the code.

My point is only that I don't think you can tell right from the start that  
a project you're working on will stay private forever. See Java for  
instance: Sun said for quite a long time that it wasn't a good idea to  
release Java as open-source and that it was highly unlikely to happen. But  
it finally did...

You could tell that the rule should be that if the project has the  
slightest chance of becoming open-source, or shared with people not  
speaking the same language as the original coders, one should not use  
non-ASCII identifiers. I'm personnally convinced that *any* industrial  
project falls into this category. So accepting non-ASCII identifiers is  
just introducing a disaster waiting to happen.

But now, I have the same feeling about non-ASCII strings, and I - as a  
project leader - won't ever accept a source file which has a "_*_ coding  
_*_" line specifying anything else than ascii... So even if I usually  
don't buy the "we're already half-dirty, so why can't we be the dirtiest  
possible" argument, I'd understand if this feature went into the language.  
But I personnally won't ever use it, and forbid it from others whenever  
I'll be able to.

> It should at least be an available option to use this feature.

If it's actually an option to the interpreter, I guess I'll just have to  
alias python to 'python --ascii-only-please'...
-- 
python -c "print ''.join([chr(154 - ord(c)) for c in  
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"