[Python-Dev] Bytes path support

R. David Murray rdmurray at bitdance.com
Sat Aug 23 04:20:55 CEST 2014


On Sat, 23 Aug 2014 00:21:18 +0200, Oleg Broytman <phd at phdru.name> wrote:
>    I'm involved in developing and maintaining a few big commercial
> projects that will hardly be ported to Python3. So I'm stuck with
> Python2 for many years and I haven't tried Python3. May be I should try
> a small personal project, but certainly not this year. May be the next
> one...

Yes, you should try it.  Really, it's not the monster you are
constructing in your mind.  The functions that read filenames and return
them as text use surrogate escape to preserve the bytes, and the
functions that accept filenames use surrogate escape to recover those
bytes before passing them back to the OS.  So posix binary filenames
just work, as long as the only thing you depend on is being able to
split and join them on the / character (and possibly the . character)
and otherwise treat the names as black boxes...which is exactly the same
situation you are in in python2.

If you need to read filenames out of a file, you'll need to specify the
surrogate escape error handler so that the bytes will be there to be
recovered when you pass them to the file system functions, but it will
work.

Or, as discussed, you can treat them as binary and use the os level
functions that accept binary input (which are exactly the ones you are
used to using in python2).  This includes os.path.split and
os.path.join, which as noted are the only things you can depend on
working correctly when you don't know the encoding of the filenames.

So, the way to look at this is that python3 is no worse[1] than python2 for
handling posix binary filenames, and also provides additional features
if you *do* know the correct encoding of the filenames.

--David

[1] modulo any remaining API bugs, which is exactly where this thread
started: trying to figure out which APIs need to be able to handle
binary paths and/or surrogate escaped paths so that posix filenames
consistently work as well in python3 as they did in python2).


More information about the Python-Dev mailing list