os.path.isfile

eryk sun eryksun at gmail.com
Sun Feb 12 23:48:23 EST 2017


On Sun, Feb 12, 2017 at 4:29 AM, Chris Angelico <rosuav at gmail.com> wrote:
> Registry subkeys aren't paths, and the other two cases are extremely
> narrow. Convert slashes to backslashes ONLY in the cases where you
> actually need to.

\\?\ paths are required to exceed MAX_PATH (a paltry 260 characters)
or to avoid quirks of DOS paths (e.g. DOS device names, stripping
trailing dots or spaces). Some programs require backslashes in paths
passed on the command line -- e.g. more.com, but it works if the path
is quoted; obviously reg.exe for registry paths (they are paths);
findstr.exe (a grep-like program); mountvol.exe needs a \\?\ Volume
GUID path; and running "C:/Windows/system32/cmd.exe" parses "/cmd.exe"
in its own name as "/c md.exe", which is more fun if you have an
"md.exe" in PATH.

In cases where you can use slash in the filesystem API, Windows is
doing the conversion for you by rewriting a copy of the path with
slash replaced by backslash -- among other normalizations. I'm not
saying that relying on the Windows base API to do this work for you is
bad, just as a rule it's simpler to call normpath on path literals
because you don't have to worry about edge cases like remembering to
normalize the path before passing it as a command-line argument. If
you're using pathlib, it already does this for you automatically:

    >>> p = pathlib.Path('spam/eggs')
    >>> os.fspath(p)
    'spam\\eggs'

There's been a significant effort to make pathlib interoperate with
the rest of the standard library in 3.6.

The point above registry subkeys inspires me to stray into Windows
internals stuff, so everyone can stop reading at this point...

Of course subkeys are relative paths. They're just not file-system
paths. Paths of named object types (e.g. Device, File, Key, Section,
Event, WindowStation, etc) are rooted in a single object namespace.
The only path separator in this namespace is backslash. Forward slash
is handled as a regular name character, except file systems in
particular reserve it for the sake of POSIX and DOS compatibility.

Here's a broad overview of what the object manager's
ObOpenObjectByName function does (the "Ob" prefix is for the object
manager), which gets called by system services such as NtOpenFile and
NtOpenKeyEx to open a named object.

The object manager implements Directory and SymbolicLink objects, so
it's the first system to parse a path, starting with the root
directory. It continues parsing path elements until it reaches an
object type that's managed by another system. Then it passes control
to that object's ParseProcedure. For a Key this is CmpParseKey (the
"Cm" prefix is for the configuration manager). For a Device such as a
disk volume, it's IopParseDevice (the "Io" prefix is for the I/O
manager). Assuming there isn't an object-type mismatch (e.g. calling
NtOpenFile on a registry key) and the object is successfully created
or referenced, then a handle created in the calling process handle
table is returned to the system service, which returns it to the
user-mode caller.

For example, parsing "C:/Program Files/Python36" first gets rewritten
by the runtime library as "\??\C:\Program Files\Python36" (consider
this a raw string, please). The object manager first parses "\??C:" by
looking for it as "\Sessions\0\DosDevices\[LogonSessionId]\C:" That's
unlikely for the C: drive (though possible). Next it checks for
"\Global??\C:". That should be a symbolic link to something like
"\Device\HarddiskVolume2".

Next it calls the parse procedure for this device object,
IopParseDevice, which sees that this is a volume device that's handled
by a file-system driver. Say the context of this parse is in the
middle of an NtOpenFile call. In this case, the I/O manager creates a
File object (which references the remaining path "\Program
Files\Python36") and an I/O request packet (IRP) to be serviced by the
file-system device stack. If the file system supports reparse points,
such as NTFS junctions and symbolic links, the IRP might be completed
with a STATUS_REPARSE code and a new path to parse. Finally, if the
open succeeds, the object manager creates a handle for the File object
in the process handle table, and the handle value is returned to the
caller.

Now consider calling NtOpenKeyEx to open a registry key. The master
registry hive has a root key named "\Registry", and two commonly used
subkeys "\Registry\Machine" and "\Registry\Users". We typically
reference the latter two keys via the pseudo-handles
HKEY_LOCAL_MACHINE (HKLM) and HKEY_USERS (HKU) -- because these also
work when accessing a remote registry over RPC.

Say we're trying to open "HKLM\Software\Python\PythonCore". The real
local path is "\Registry\Machine\Software\Python\PythonCore". The
first thing to do is open and cache the real handle for the HKLM
pseudo-handle, by opening "\Registry\Machine". NtOpenKeyEx calls
ObOpenObjectByName, and the object manager begins parsing the path. It
hands off parsing to the ParseProcedure of the "\Registry" object,
CmpParseKey, which returns a pointer reference to the
"\Registry\Machine" key object. The object manager creates a handle
for the object in the process handle table, and NtOpenKeyEx returns
this handle to the caller. It's little known, but the registry also
supports symbolic links, so CmpParseKey may return STATUS_REPARSE with
a new path for the object manager to parse.

Next it does a relative open on the path "Software\Python\PythonCore"
using the "\Registry\Machine" handle as the RootDirectory for the
ObjectAttributes of the open. A relative open for a disk volume works
the same way (e.g. opening a file relative to a handle for the working
directory). The interesting thing about the documented registry API is
that it exposes this native ability to open relative to a handle (like
Unix *at system calls). Similar functionality could be supported in
CreateFile by extending the sized SECURITY_ATTRIBUTES structure to add
a RootDirectory field. As is you have to call NtCreateFile or
NtOpenFile to get this functionality, which isn't supported.

Let's check this out in the debugger. First Windows opens
"\Registry\Machine" to cache the real handle for its HKLM
pseudo-handle.

    Breakpoint 0 hit
    ntdll!NtOpenKeyEx:
    00007ffd`b29b82d0 4c8bd1          mov     r10,rcx
    0:000> !obja @r8
    Obja +000000913bfef878 at 000000913bfef878:
            Name is \REGISTRY\MACHINE
            OBJ_CASE_INSENSITIVE

    0:000> r rcx
    rcx=000000913bfef838
    0:000> pt
    ntdll!NtOpenKeyEx+0x14:
    00007ffd`b29b82e4 c3              ret

The handle returned is 0x70:

    0:000> dq 913bfef838 l1
    00000091`3bfef838  00000000`00000070
    0:000> g

Next it opens the relative path "Software\Python\PythonCore".

    Breakpoint 0 hit
    ntdll!NtOpenKeyEx:
    00007ffd`b29b82d0 4c8bd1          mov     r10,rcx
    0:000> !obja @r8
    Obja +000000913bfef6d0 at 000000913bfef6d0:
            Name is Software\Python\PythonCore
            OBJ_CASE_INSENSITIVE

The RootDirectory field is 0x70, the handle for "\Registry\Machine",
as we can easily see in the kernel debugger when looking at the above
address (0x913bfef6d0):

    lkd> ?? (nt!_OBJECT_ATTRIBUTES *)0x913bfef6d0
    struct _OBJECT_ATTRIBUTES * 0x00000091`3bfef6d0
       +0x000 Length           : 0x30
       +0x008 RootDirectory    : 0x00000000`00000070 Void
       +0x010 ObjectName       : 0x00000091`3bfef978 _UNICODE_STRING
                                 "Software\Python\PythonCore"
       +0x018 Attributes       : 0x40
       +0x020 SecurityDescriptor : (null)
       +0x028 SecurityQualityOfService : (null)

    lkd> !handle 0x70 3
    0070: Object: ffffe00a8ddfbf70
          GrantedAccess: 000f003f (Audit)
          ...
          Name: \REGISTRY\MACHINE

To close this discussion out, here's another problem involving slash
in named objects. The Windows base API creates many per-session
objects in a "BaseNamedObjects" directory located at
"\Sessions\[SessionId]\BaseNamedObjects" or globally in
"\BaseNamedObjects". It's a dumping ground for objects that don't have
a better place to call home. Within the session directory there's a
"Global" symbolic link to the system-wide "\BaseNamedObjects"
directory. When creating a named object, you can use that link to name
the object globally for all sessions. For example, creating a
shared-memory section named "Global\MySharedMemory" actually creates
"\BaseNamedObjects\MySharedMemory". But if you accidentally write
"Global/MySharedMemory", you'll instead create an object named with a
literal slash in the local session's BaseNamedObjects. I've seen this
problem before in a Stack Overflow question. People get lulled into a
false belief that the Windows API will handle forward slashes as path
separators in anything that's pathlike (and indeed is actually
implemented as a relative path under the hood), but that's only the
file-system API.



More information about the Python-list mailing list