[Python-Dev] When should pathlib stop being provisional?

Stephen J. Turnbull stephen at xemacs.org
Wed Apr 6 01:37:27 EDT 2016


Chris Angelico writes:

 > Outside of deliberate tests, we don't create files on our disks
 > whose names are strings of random bytes;

Wishful thinking.  First, names made of control characters have often
been deliberately used by miscreants to conceal their warez.  Second,
in some systems it's all too easy to create paths with components in
different locales (the place I've seen it most frequently is in NFS
mounts).  I think that's much less true today, but perhaps that's only
because my employer figured out that it was much less pain if system
paths were pure ASCII so that it mostly didn't matter what encoding
users chose for their subtrees.

It remains important to be able to handle nearly arbitrary bytestrings
in file names as far as I can see.  Please note that 100 million
Japanese and 1 billion Chinese by and large still prefer their
homegrown encodings (plural!!) to Unicode, while many systems are now
defaulting filenames to UTF-8.  There's plenty of room remaining for
copying bytestrings to arguments of open and friends.



More information about the Python-Dev mailing list