Multibyte Character Surport for Python
Martin v. Loewis
martin at v.loewis.de
Wed May 8 16:05:18 EDT 2002
pinard at iro.umontreal.ca (François Pinard) writes:
> Existing code is not going to break, as long as English identifiers stay
> a subset of nationally written identifiers. Which is usually the case
> for most character sets, Unicode among them, allowing ASCII letters as a
> subset of all letters.
For Python, existing code, like inspect.py, *will* break: if
introspective code is suddenly confronted with non-ASCII identifiers,
it might break, e.g. if Unicode objects show up as keys in __dict__.
> Having the capability of writing identifiers with national letters is
> not going to break other people's code, this assertion looks a bit like
> gratuitous FUD to me. Unless you are referring to probable transient
> implementation bugs which are normal part of any release cycle?
No. The implementation strategy would be to allow Unicode identifiers
at run-time, and all introspective code - either within the Python
code base, or third-party, would need revision.
> Python has undergone changes which were much deeper and much more
> drastic than this one would be, and the fear of transient bugs has
> not been a stopper.
PEP 263 will introduce the notion of source encodings - without this,
it wouldn't even be possible to parse the source code, anymore. The
PEP, over months, had a question in it asking whether non-ASCII
identifiers should be allowed (the follow-up question would then be:
which ones?), and nobody ever spoke up requesting such a feature.
It is a real surprise for me that suddenly people want this.
Regards,
Martin
More information about the Python-list
mailing list