[Python-Dev] Bytes path support

Stephen J. Turnbull stephen at xemacs.org
Wed Aug 20 08:38:01 CEST 2014


Guido van Rossum writes:
 > On Tuesday, August 19, 2014, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > Greg Ewing writes:

 > >  > So maybe the way to make bytes paths go away is to always
 > >  > use surrogateescape for paths on unix?
 > >
 > > Backward compatibility rules that out, I think.  I certainly would
 > > recommend that for new code, but even for new code there are many
 > > users who vehemently object to using Unicode as an intermediate
 > > representation of things they think of as binary blobs.  Not worth the
 > > hassle to even seriously propose removing those APIs IMO.
 > 
 > But maybe we don't have to add new ones?

IMO, we should avoid it.

There may be some use cases.  Sergiy mentions two bug reports.

http://bugs.python.org/issue19997 imghdr.what doesn't accept bytes paths
http://bugs.python.org/issue20797 zipfile.extractall should accept bytes path as parameter

I'm very unsympathetic to these.  In both cases the bytes are coming
from outside of module in question.  Why are they in bytes?  That
question should scare you, because from the point of view of end users
there are no good answers: they all mean that the end user is going to
end up with uninterpretable bytes in their directories, for the
convenience of the programmer.

In the case of issue20797, I'd be a *little* sympathetic if the RFE
were for the *members* argument.  zipfiles evidently have no way to
specify the encodings of the name(s) of their members (and the zipfile
module doesn't have APIs for it!), so the programmer is kind of stuck,
especially if the requirement is that the extraction require no user
intervention.  But again, this is rarely what the user wants.

I would be sympathetic to an internal, bytes-based, "kids these stunts
are performed by trained professionals do NOT try this at home" API,
with a sane user-oriented str-based API for ordinary use for this
module.  I suppose it might be useful for such a multi-type API to be
polymorphic, but it would have to be a "if there are bytes anywhere,
everything must be bytes and return values will be bytes" and
similarly for str kind of polymorphism.  No mixing bytes and strings,
period.





More information about the Python-Dev mailing list