Question about working with html entities in python 2 to use them as filenames

Steve D'Aprano steve+python at pearwood.info
Tue Nov 22 20:32:52 EST 2016


On Wed, 23 Nov 2016 09:00 am, Lew Pitcher wrote:

> 2) Apparently os.mkdir() (at least) defaults to requiring an ASCII
> pathname. 

No, you have misinterpreted what you have seen.

Even in Python 2, os.mkdir will accept a Unicode argument. You just have to
make sure it is given as unicode:

os.mkdir(u'/tmp/für')

Notice the u' delimiter instead of the ordinary ' delimiter? That tells
Python to use a unicode (text) string instead of an ascii byte-string.

If you don't remember the u' delimiter, and write an ordinary byte-string '
delimiter, then the result you get will depend on some combination of your
operating system, the source code encoding, and Python's best guess of what
you mean.

os.mkdir('/tmp/für')  # don't do this!

*might* work, if all the factors align correctly, but often won't. And when
it doesn't, the failure can be extremely mysterious, usually involving a
spurious 

UnicodeDecodeError: 'ascii' codec

error.

Dealing with Unicode text is much simpler in Python 3. Dealing with
*unknown* encodings is never easy, but so long as you can stick with
Unicode and UTF-8, Python 3 makes it easy. 




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list