Multibyte Character Surport for Python

Martin v. Loewis martin at v.loewis.de
Wed May 8 16:05:18 EDT 2002


pinard at iro.umontreal.ca (François Pinard) writes:

> Existing code is not going to break, as long as English identifiers stay
> a subset of nationally written identifiers.  Which is usually the case
> for most character sets, Unicode among them, allowing ASCII letters as a
> subset of all letters.

For Python, existing code, like inspect.py, *will* break: if
introspective code is suddenly confronted with non-ASCII identifiers,
it might break, e.g. if Unicode objects show up as keys in __dict__.

> Having the capability of writing identifiers with national letters is
> not going to break other people's code, this assertion looks a bit like
> gratuitous FUD to me.  Unless you are referring to probable transient
> implementation bugs which are normal part of any release cycle?  

No. The implementation strategy would be to allow Unicode identifiers
at run-time, and all introspective code - either within the Python
code base, or third-party, would need revision.

> Python has undergone changes which were much deeper and much more
> drastic than this one would be, and the fear of transient bugs has
> not been a stopper.

PEP 263 will introduce the notion of source encodings - without this,
it wouldn't even be possible to parse the source code, anymore. The
PEP, over months, had a question in it asking whether non-ASCII
identifiers should be allowed (the follow-up question would then be:
which ones?), and nobody ever spoke up requesting such a feature.

It is a real surprise for me that suddenly people want this.

Regards,
Martin



More information about the Python-list mailing list