[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Stephen J. Turnbull stephen at xemacs.org
Fri Oct 10 06:38:25 CEST 2008


Terry Reedy writes:

 > If FOOTR is using PUA chars, then I believe that users should not
 > be providing such a stream as it would have no defined meaning
 > coming from them.

But that's precisely what "private use" means: the users provide their
own definitions!  The Unicode standard provides that if a process
doesn't know what those characters mean, it *must* pass them through
*unchanged*, on the assumption that they will eventually reach a user
who knows what they mean.

So this means that (to conform to Unicode) every Python program must
take responsibility for ensuring that it tracks every filename to be
sure that no internal-use PUA characters make it to the "outside
world" where they will be propagated indefinitely by conforming
processes.  This is a substantial burden.

This is precisely the advantage of UTF-8b: the first conforming
process that catches any escapees will scream bloody murder and turn
them over to the Spanish Inquisition, who will torture them on the
rack until they confess that Python did it.<wink>



More information about the Python-3000 mailing list