Non-unicode file names

MRAB python at mrabarnett.plus.com
Wed Aug 8 22:14:03 EDT 2018


On 2018-08-09 01:14, Thomas Jollans wrote:
> On 09/08/18 01:48, MRAB wrote:
>> On 2018-08-08 23:16, Thomas Jollans wrote:
>>> On *nix, file names are bytes. In real life, we prefer to think of file
>>> names as strings. How non-ASCII file names are created is determined by
>>> the locale, and on most systems these days, every locale uses UTF-8 and
>>> everybody's happy. Of course this doesn't mean you'll never run into and
>>> old directory tree from the pre-UTF8 age using some other encoding, and
>>> it doesn't prevent people from doing silly things in file names.
>>>
>>> Python deals with this tolerably well: by convention, file names are
>>> strings, but you can use bytes for file names if you wish. The docs [1]
>>> warn you about the situation.
>>>
>>> [1] https://docs.python.org/3/library/os.path.html
>>>
>>> If Python runs into a non-UTF8 (better: non-decodable) file name and has
>>> to return a str, it uses surrogate escape codes. So far so good. Right?
>>>
>>> This leads to the unfortunate situation that you can't always print()
>>> file names, as print() is strict and refuses to toy with surrogates.
>>>
>>> To be more explicit, the script
>>>
>>>      print(__file__)
>>>
>>> will fail depending on the file name. This feels wrong... (though every
>>> bit of behaviour is correct)
>>>
>>> (The situation can't arise on Windows, and Python 2 will pretend nothing
>>> happened in true UNIX style)
>>>
>>> Demo script to try at home below.
>>>
>> [snip]
>> 
>> Is it true that Unix filenames can contain control characters, e.g. \x07?
>> 
>> When happens when you print them out?
>> 
>> I think it's not just a problem with surrogate escapes.
> 
> Not a problem (or: not an exception), as those are ASCII and thus UTF-8.
> 
> Python 3.6.5 (default, Apr  1 2018, 05:46:30)
> [GCC 7.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> with open('\x07.py', 'w') as fp:
> ...     fp.write('print(__file__)\n')
> ...
> 16
>>>> import sys; import subprocess
>>>> subprocess.call([sys.executable, '\x07.py'])
> .py
> 0
>>>>
> 
> As you might expect, it beeped when printing '\x07.py' (and showed .py)
> 
And that's OK, is it? :-)



More information about the Python-list mailing list