[Python-Dev] Bytes path support

Steven D'Aprano steve at pearwood.info
Sat Aug 23 13:08:29 CEST 2014


On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote:

> The point is that if you are reading a file name from the system, and then
> passing it back to the system, then you can treat it as just bytes -- who
> cares? And if you add the byte value of 47 thing, then you can even do
> basic path manipulations. But once you want to do other things with your
> file name, then you need to know the encoding. And it is very, very common
> for users to need to do other things with filenames, and they almost always
> want them as text that they can read and understand.
> 
> Python3 supports this case very well. But it does indeed make it hard to
> work with filenames when you don't know the encoding they are in.

Just "not knowing" is not sufficient. In that case, you'll likely get a 
Unicode string containing moji-bake:

# I write a file name using UTF-8 on my system:
filename = 'music by Наӥв.txt'.encode('utf-8')
# You try to use it assuming ISO-8859-7 (Greek)
filename.decode('iso-8859-7')
=> 'music by Π\x9dΠ°Σ₯Π².txt'

which, even though it looks wrong, still lets you refer to the file 
(provided you then encode back to bytes with ISO-8859-7 again). This 
won't always be the case, sometimes the encoding you guess will be 
wrong.

When I started this email, I originally began to say that the actual 
problem was with byte file names that cannot be decoded into Unicode 
using the system encoding (typically UTF-8 on Linux systems. But I've 
actually had difficulty demonstrating that it actually is a problem. I 
started with a byte sequence which is invalid UTF-8, namely:

b'ZZ\xdb\xdf\xfa\xff'

created a file with that name, and then tried listing it with 
os.listdir. Even in Python 3.1 it worked fine. I was able to list the 
directory and open the file, so I'm not entirely sure where the problem 
lies exactly. Can somebody demonstrate the failure mode?


-- 
Steven


More information about the Python-Dev mailing list