Behaviour of os.path.join

Eryk Sun eryksun at gmail.com
Thu May 28 20:24:50 EDT 2020


On 5/28/20, Roel Schroeven <roel at roelschroeven.net> wrote:
> Eryk Sun schreef op 28/05/2020 om 15:51:
>> On 5/27/20, Chris Angelico <rosuav at gmail.com> wrote:
>>> On Thu, May 28, 2020 at 7:07 AM BlindAnagram <blindanagram at nowhere.com>
>>> wrote:
>>>> You can define a path however you want but it won't change the fact
>>>> that on Windows a path that ends in '\\' is inherently a path to a
>>>> directory.

[snip]

> i.e. again, trailing backslashes are not allowed for non-directory files.
>
> There is no limitation listed that says something like "The operation
> must be failed if pathname doesn't contain a trailing backslash and
> CreateOptions.FILE_NON_DIRECTOR_FILE is FALSE". Path names referring to
> directories are allowed to have trailing backslashes, but no requirement
> to do so. Directory PathNames with and without trailing backslashes are
> handled exactly the same.

The statement I was documenting (quoted above) is that a path that
ends in a slash has to be a directory, not that a directory cannot be
accessed without a trailing slash.

The create option for NtCreateFile can be FILE_NON_DIRECTORY_FILE,
FILE_DIRECTORY_FILE, or left undefined. If it's not defined, and the
opened path has a trailing slash, then for an open-existing
disposition, the existing stream must be a directory.  It's an
invalid-name error (STATUS_OBJECT_NAME_INVALID) if the existing stream
is a regular file. If FILE_NON_DIRECTORY_FILE is specified (the
default for WinAPI CreateFileW), the path cannot have a trailing slash
because that designates a directory, which is inconsistent with the
create option. Again, this is an invalid-name error, which is distinct
from errors where the create option disagrees with the stream type
(i.e. STATUS_FILE_IS_A_DIRECTORY and STATUS_NOT_A_DIRECTORY).

>> Internally, WinAPI CreateFileW calls NTAPI NtCreateFile with the
>> create option FILE_NON_DIRECTORY_FILE (i.e. only open or create a data
>> file stream), unless backup semantics are requested in order to be
>> able to open a directory (i.e. an index stream), in which case the
>> call uses neither FILE_NON_DIRECTORY_FILE nor FILE_DIRECTORY_FILE and
>> leaves it up to the path name.
>
> No, that is not my understanding. It is up to the actual type of file
> specified by the path. CreateFileW using FILE_FLAG_BACKUP_SEMANTICS  can
> only open existing directories, so there is no need to look at the last
> character of the path.

It is true that just including a trailing slash does not set the
StreamTypeToOpen to a directory stream. That's why a mismatch leads to
an invalid-name error instead of a directory error (i.e.
STATUS_NOT_A_DIRECTORY), because it's only the name in the opened path
that's inconsistent with the existing stream type.

In order to create a directory, the StreamTypeToOpen must be a
directory. One way to set that is to explicitly use the
FILE_DIRECTORY_FILE create option. This is possible by using a
CREATE_NEW disposition with the flags and attributes
FILE_ATTRIBUTE_DIRECTORY | FILE_FLAG_BACKUP_SEMANTICS |
FILE_FLAG_POSIX_SEMANTICS. For example:

    >>> flags = FILE_ATTRIBUTE_DIRECTORY
    >>> flags |= FILE_FLAG_BACKUP_SEMANTICS
    >>> flags |= FILE_FLAG_POSIX_SEMANTICS
    >>> disposition = CREATE_NEW
    >>> h = CreateFile('spam', 0, 0, None, disposition, flags, None)
    >>> os.path.isdir('spam')
    True

AFAIK, Microsoft never documented the above capability, so, even
though NT's implementation of the Windows API has included it for
almost 30 years, it's still reasonable to ignore it.

As documented in [MS-FSA], the other way to set StreamTypeToOpen to a
directory is to explicitly open an $INDEX_ALLOCATION stream (named
"$I30", but the name can be omitted). This requires the filesystem to
support file streams (e.g. NTFS, ReFS).  For example:

    >>> flags = 0
    >>> disposition = CREATE_NEW
    >>> h = CreateFile('eggs::$INDEX_ALLOCATION', 0, 0, None,
disposition, flags, None)
    >>> os.path.isdir('eggs')
    True

Notice that I didn't have to request backup semantics. Thus it created
a directory even though the default FILE_NON_DIRECTORY_FILE create
option was used. That's peculiar, but it's actually documented by
[MS-FSA]:

    * If CreateOptions.FILE_DIRECTORY_FILE is TRUE then StreamTypeToOpen =
      DirectoryStream.

    * Else if StreamTypeNameToOpen is "$INDEX_ALLOCATION" then
      StreamTypeToOpen = DirectoryStream.

    * Else if CreateOptions.FILE_NON_DIRECTORY_FILE is FALSE,
      StreamNameToOpen is empty, StreamTypeNameToOpen is empty, Open.File
      is not NULL, and Open.File.FileType is DirectoryFile then
      StreamTypeToOpen = DirectoryStream.

    * Else StreamTypeToOpen = DataStream.

> But if you never add a backslash in the end, everything will work just
> fine for both files and directories.

If you expect to open a directory, then appending a trailing slash and
relying on the the invalid-name error is a one way to ensure that,
though the error isn't really specific enough for my liking. To be
more explicit, if programming at a lower level in C/C++, query
GetFileInformationByHandleEx: FileBasicInfo, and check the
FileAttributes for FILE_ATTRIBUTE_DIRECTORY.


More information about the Python-list mailing list