Why exception from os.path.exists()?

Fri Jun 1 19:45:57 EDT 2018

On Fri, Jun 1, 2018 at 3:58 PM, Richard Damon <Richard at damon-family.org> wrote:
>
> The fundamental question is about case 2. Should os.path.exist, having
> been give a value of the right 'Python Type' but not matching the type
> of the operating system parameter identify this as an error (as it
> currently does), or should it be changed to decide that if it could
> somehow get that parameter to the os, then it would say that the file
> doesn't exist, and so return false.

AFAIK, this behavior hasn't been documented. So it can either be
documented, and thus never allow NUL in paths, or else every call that
currently raises ValueError for this case should raise a pretend
FileNotFoundError. No change to exists(), isdir(), and isfile() would
be required.

For Windows, there's another case that's in more of a grey area.
Python 3.6+ uses UTF-8 as the file-system encoding in Windows.
Internally it transcodes between UTF-8 and the native UTF-16 encoding.
The "surrogatepass" error handler is used in order to faithfully
handle invalid surrogates, which the system allows. This leaves no
simple way to smuggle invalid UTF-8 sequences into the filename and
rountrip back to bytes, so UnicodeDecodeError (a subclass of
ValueError) is raised. The same invalid UTF-8 would pass silently in
POSIX, which uses bytes paths and the "surrogateescape" handler.

Trivia:
The native NT API of Windows can use device names that contain NUL
characters because it uses counted strings in the OBJECT_ATTRIBUTES
record that's used to access named objects (e.g. Device, Section, Job,
Event, Semaphore, etc). I've tested that this works. A file system
could also allow NUL in names, but Microsoft's drivers reserve NUL as
an invalid character, as would any driver that uses the file-system
runtime library. That said, native NT applications have a limited
scope, so it's almost pointless to speculate.