[Python-ideas] Three ways of paths canonization

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Wed Sep 7 10:47:36 EDT 2016


Serhiy Storchaka writes:

 > The readlink utility from GNU coreutils has three mode for resolving 
 > file path:
 > 
 >         -f, --canonicalize
 >                canonicalize by following every symlink in every 
 > component of the given name recursively; all but the last component must 
 > exist
 > 
 >         -e, --canonicalize-existing
 >                canonicalize by following every symlink in every 
 > component of the given name recursively, all components must exist
 >
 >         -m, --canonicalize-missing
 >                canonicalize by following every symlink in every 
 > component of the given name recursively, without requirements on 
 > components existence
 
In Mac OS X (and I suppose other BSDs), realpath(3) implements -e.
glibc does none of these, instead:

   GNU extensions
       If the call fails with either EACCES or ENOENT and
       resolved_path is not NULL, then the prefix of path that is not
       readable or does not exist is returned in resolved_path.

I suppose this nonstandard behavior is controlled by a #define, but
the Linux manpage doesn't specify it.

 > Current behavior of posixpath.realpath() is matches (besides one minor 
 > detail) to `readlink -m`. The behavior of Path.resolve() matches 
 > `readlink -e`.

This looks like a bug in posixpath, while Path.resolve follows POSIX.
http://pubs.opengroup.org/onlinepubs/009695399/functions/realpath.html
sez:

    RETURN VALUE

    Upon successful completion, realpath() shall return a pointer to
    the resolved name. Otherwise, realpath() shall return a null
    pointer and set errno to indicate the error, and the contents of
    the buffer pointed to by resolved_name are undefined.

    ERRORS

    The realpath() function shall fail if:

[...]
    [ENOENT] A component of file_name does not name an existing file or
        file_name points to an empty string.
    [ENOTDIR] A component of the path prefix is not a directory.

which corresponds to -e.

 > I have proposed a patch that adds three-state optional parameter to 
 > posixpath.realpath() and I'm going to provide similar patch for 
 > Path.resolve(). But I'm not sure this is good API. Are there better 
 > variants?

Said parameter will almost always be a constant.  Usually in those
cases Python prefers to use different functions.  Eg,

    posixpath.realpath                    -e
    posixpath.realpath_require_prefix     -f
    posixpath.realpath_allow_missing      -m
    posixpath.realpath_gnuext             GNU extension



More information about the Python-ideas mailing list