[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Stephen J. Turnbull stephen at xemacs.org
Fri Apr 24 20:40:12 CEST 2009


Antoine Pitrou writes:
 > Stephen J. Turnbull <stephen <at> xemacs.org> writes:
 > > 
 > > Well, the problem is that both parts are false.  If you didn't start
 > > with a valid string in a known encoding, you shouldn't treat it as
 > > characters because it's not.  Hand it to a careful API, and you'll get
 > > an Exception raised in your face.
 > 
 > Which "careful API" are you talking about?
 >
 > > OTOH, at least some of those who feel lucky and use it
 > > naively are going to turn out to be wrong.
 > 
 > Why will they turn out to be wrong?

To quote the PEP:

"""
While providing a uniform API to non-decodable bytes, this interface
has the limitation that chosen representation only "works" if the data
get converted back to bytes with the python-escape error handler
also. Encoding the data with the locale's encoding and the (default)
strict error handler will raise an exception, encoding them with UTF-8
will produce non-sensical data.

For most applications, we assume that they eventually pass data
received from a system interface back into the same system
interfaces.
"""

But you can't know that.  These are now "just strings", which could
end up in pickles and other persistent objects, be passed across
network interfaces (remote copy, for example), etc, etc, and there is
no way to guarantee that the recipient will understand the rules,
unless the application encapsulates them in some kind of
representation that says "I look like a Unicode but I'm really just
encoded bytes."  But the whole point is to turn them into plain old
strings so people *don't have to bother* keeping track.

As I already said, this is no worse than the current situation, but it
gives the impression that Python has a standard "solution".  (Yes, I
know Martin doesn't claim it's a solution to any of those problems.
The point is user perception.)

I have to wonder whether having a standard way of not solving any
problems is better than having no standard way of not solving any
problems.  It may be, and it probably can't hurt, which is why I'm +0.



More information about the Python-Dev mailing list