Why exception from os.path.exists()?

Chris Angelico rosuav at gmail.com
Thu Jun 7 13:52:39 EDT 2018


On Fri, Jun 8, 2018 at 3:10 AM, MRAB <python at mrabarnett.plus.com> wrote:
> On 2018-06-07 08:45, Chris Angelico wrote:
>> Under Linux, a file name contains bytes, most commonly representing
>> UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
>> or at least a representation of it. Yet it cannot contain "\0".
>>
> I've seen a variation of UTF-8 that encodes U+0000 as 2 bytes so that a zero
> byte can be used as a terminator.
>
> It's therefore not impossible to have a version of Linux that allowed a
> (Unicode) "\0" in a filename.

Considering that Linux treats filenames as raw bytes, that's not
surprising. The mangled encoding you refer to is a horrendous cheat,
though, and violates several of the design principles of UTF-8, so I
do not recommend it EVER. The correct way for Python to handle and
represent such a file name would be to use the U+DCxx range to carry
the bytes through unchanged - not using "\0".

ChrisA



More information about the Python-list mailing list