[Python-Dev] forwarded message from Stephen J. Turnbull

Guido van Rossum guido@python.org
Mon, 04 Mar 2002 09:39:50 -0500


[Stephen J. Turnbull]
> [...]  I feel that it is possible to support the users who want
> to use national encodings AND define the language in terms of a single
> coded character set, as long as that set is Unicode.  The usual
> considerations of file system safety and standard C library
> compatibility dictate that the transformation format be UTF-8.  (Below
> I will just write "UTF-8" as is commonly done.)
> 
> My belief is that the proposal below has the same effect on most users
> most of the time as PEP 263, while not committing Python to indefinite
> support of a subsystem that will certainly be obsolete for new code in
> 5 years, and most likely within 2 (at least for people using open
> source and major vendor tools, I don't know what legacy editors people
> may be using on "big iron" and whatnot).

If your concern is that PEP 263 will bind us to indefinite support of
the encoding cookie feature, I propose to add a "sunset provision" to
the PEP, just as is commonly done to U.S. laws so that they expire
after a certain date.

I think it's a good idea to consider your hook proposal as an
implementation strategy for the PEP, but I believe it would be wise if
this were adopted as a standard feature rather than something users
need to configure explicitly.

You bring up one important point that AFAIK isn't addressed by the
PEP: when text is presented to the parser in the form of an 8-bit
string object, should an encoding cookie be honored if present?  I'd
say yes.  When a Unicode string is presented, encoding cookies should
be ignored, of course.

--Guido van Rossum (home page: http://www.python.org/~guido/)