[Python-Dev] Defining a path protocol

Stephen J. Turnbull stephen at xemacs.org
Sun Apr 10 12:29:00 EDT 2016


Ethan Furman writes:

 > It means the stuff in place won't change, but the stuff we're
 > adding now to integrate with Path will only support str (which is
 > one reason why os.path isn't going to die).

I don't think this is a reason for keeping os.path.  (Backward
compatibility with existing code is sufficient, of course.)  Support
of str for all file names is provided by PEP 383.  ISTM there's no big
loss to using PEP 383's 'surrogateescape' handler to allow un-decode-
able filenames in pathlib.Path: they're very rare.  AFAIK pathlib
doesn't care about surrogates -- after all, they're entirely
"consenting adults" stuff.  Of course that detracts a bit from the
attractiveness of pathlib.Path vs. os.path or bytes methods, but only
for a use case most people won't encounter in practice.

We continue to support bytes at the os/io/open level for the same
reasons you added formatting back to bytes: there are times when it's
as least as natural to work with bytes as str (eg, when the path is
passed around without manipulation) and more convenient (eg, you don't
have to deal with encodings and UnicodeError handling).

 > After all, the idea is to make these things work with the stdlib, and 
 > the stdlib accepts bytes for path strings.

I don't see a problem.  In dealing with legacy data (archives that
include paths, such as .zips and .isos) we may find un-decode-able
paths, or paths that are decode-able but by undetermined encoding, for
a while to come (decades).  For those, the bytes interfaces are
preferable to unlovely expedients like decoding as 'iso8859-1'.  But
those are specialized use cases.

Sane people dealing with current file systems won't need bytes in
pathlib, and most "out of bounds" uses for pathlib I can think of in
my own experience will be able to use surrogateescape.



More information about the Python-Dev mailing list