Why exception from os.path.exists()?

Peter J. Holzer hjp-python at hjp.at
Wed Jun 13 16:37:25 EDT 2018


On 2018-06-13 10:10:03 +0300, Marko Rauhamaa wrote:
> "Peter J. Holzer" <hjp-python at hjp.at>:
> > On 2018-06-11 12:24:54 +0000, Steven D'Aprano wrote:
> >> It also clearly states:
> >> 
> >>     All functions in this module raise OSError in the case of
> >>     invalid or inaccessible file names and paths, or other
> >>     arguments that have the correct type, but are not accepted
> >>     by the operating system. 
> >> 
> >> You know... like strings with NUL in them.
> 
> Nice catch!
> 
> > Ok. I missed that. So either the documentation or the implementation
> > should be fixed. 
> >
> > In any case, if the implementation is changed, I still think that
> > OSError(ENOENT) is wrong. It would have to be OSError(None, "embedded
> > null byte"), or, if that is not possible (I haven't checked)
> > OSError(EINVAL, "embedded null byte"), although that is slightly
> > misleading (it implies that the OS returned EINVAL, which it didn't).
> 
> You say "misleading", I say "abstracting".

If I get an error message which leads me on a wild goose chase, I call
that misleading when I'm in a good mood. If I'm feeling cranky, I call
it "lying".


> > The same check for NUL is also in other functions (e.g. open()), so
> > those would have to be changed as well.
> 
> Maybe.

Consistency is a virtue.


> > I wasn't entirely clear here. What I meant is that POSIX systems, as a
> > group, provide no such way.
> 
> I still don't see how POSIX is directly relevant here.

POSIX systems (or more specifically, systems where the Python
implementation uses a POSIX-conforming API to access the file system)
are relevant here because on such systems the Python implementation
needs to treat filenames with an embedded NUL specially.

The reasons have been mentioned several times in this threadm, but to
recap:

1) The API uses nul-terminated byte strings for file names. 
2) Python may also use byte strings for for file names, but they are not
   nul-terminated (they may contain nuls)
3) Simply passing a pointer to the start of a python byte string to the
   OS seems to work, and is therefore tempting.
4) But this would mean the OS gets a different file name than the
   application passed to it if the name contains NUL, which can lead to
   security holes (this isn't theoretical, it has happened)
5) Therefore an implemntation must not succumb to the tempation in point 
   3 and must explicitely check for NULs.

A theoretical Python implementation on MacOS using the Carbon API
wouldn't have to do this (and in fact it shouldn't). This is
system-dependent code ensuring that the OS API is called correctly.

For os.stat() POSIX is further relevant because stat() is a POSIX
function. On POSIX systems, os.stat() is just a very thin wrapper around
the syscall. On other systems, POSIX stat is basically emulated by
invoking other system calls.

A user on a POSIX system should therefore expect the result of os.stat()
be the same as that of the stat() system call (i.e. if successful the
fields should have the same values and if not, the exception should
reflect the errno returned by the OS). On other systems a user can only
expect a rough correspondence between what the actual system call
returned and what os.stat() returns, because there may not be a simple
1:1 mapping.

POSIX specifies a number of error codes which can be returned by stat():

[EACCES]
    Search permission is denied for a component of the path prefix.
[EIO]
    An error occurred while reading from the file system.
[ELOOP]
    A loop exists in symbolic links encountered during resolution of the
    path argument.
[ENAMETOOLONG]
    The length of a component of a pathname is longer than {NAME_MAX}.
[ENOENT]
    A component of path does not name an existing file or path is an
    empty string.
[ENOTDIR]
    A component of the path prefix names an existing file that is
    neither a directory nor a symbolic link to a directory, or the path
    argument contains at least one non- <slash> character and ends with
    one or more trailing <slash> characters and the last pathname
    component names an existing file that is neither a directory nor a
    symbolic link to a directory.
[EOVERFLOW]
    The file size in bytes or the number of blocks allocated to the file
    or the file serial number cannot be represented correctly in the
    structure pointed to by buf. 

A Python application may want to treat these errors differently. Even if
the application doesn't, the user reading the stack trace will want to
see the correct errno and not some generic "something went wrong"
message.

Note that none of these covers "file name contains an illegal character"
for the simple reason that on POSIX systems there are no illegal
characters. 

So none of these is a good choice for the errno parameter of an OSError
to be thrown. One might try to find out what Linux returns on a
filesystem which doesn't allow some characters, but that would be Linux
specific and probably even file system specific, so not a good choice
for a situation which can occur on many systems. I think that the best
way to do it would be set errno to None, because it reflects rather
clearly that the OS didn't return an error (because it wasn't even
called). As a user and programmer I find it very important to get
precise error messages. I really, really hate it when the computer lies
to me. However, errno is normally an int, and None isn't. So I'm not
sure how much code an OSError with errno=None would break. Maybe the
reason why os.stat raises a ValueError instead of an OSError is that
whoever wrote that code thought an OSError with an unexpected errno
would break too much existing code. I don't know. Raising an OSError
with errno=EINVAL is too close to lying for my taste, but I could live
with that.

For non-POSIX systems, most of the above doesn't apply.

Which brings me to the third reason why I talk about POSIX: Because I
know it. I don't know much about the Windows API and I know almost
nothing about Carbon, so I cannot offer much of an opinion on how Python
should be implemented on them. But I do know a number of POSIX
systems[1] and I know how I would like Python to behave on them and how
this behaviour can be implemented.

        hp

[1] I probably should use the past tense here. 20 years ago I regularly
    used several different Unixes. Now it's only Linux. And I don't see
    that changing anytime soon.

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180613/b4d979f7/attachment.sig>


More information about the Python-list mailing list