[Python-ideas] solution to cross-platform path handling problems

anatoly techtonik techtonik at gmail.com
Sat Nov 23 17:14:42 CET 2013


I will talk about separating "mount"s and "path" concepts in path handling.

On the great talk about writing cross-platform applications back in 2010
there is a good point about Python's cross-platform abstraction to path issues.
http://clanmills.com/files/dist/doc/cross_platform.html#python-batteries-included

Recent noize around new pathlib and my own experience with os.path made me
change my mind that Python has a convenient library for cross-platform path
handling. It is much better than dealing with slashed strs (true), but there are
still hidden issues (that I can not even summarize, because I don't know what
tracker query should I run to get it).

While criticizing "pathlib" to see what I dislike about it, I realized
that there is a
lot of ambiguity in the world of filesystem/resource paths. Every
platform-specific path library fails, because from one side people
don't know differences between all
operating systems, probably because they don't want, don't have time
or info. On the other side people need to write cross-platform apps.
"pathlib" does a good job
by providing PEP with info, but I think that architecturally it
doesn't solve the
problem of path handling complexity. Syntax sugar - yes, explicit
approach - yes,
time savings - no, more readable code - "no > yes", code that frees you from
thinking how "these three lines" will work on MacOS/Unix/Windows - no.

The root of the problem is in traditional "relative" vs "absolute"
path approach.

Take "Definitions" from PEP 428.
"""
1. All paths can have a drive and a root. For POSIX paths, the drive
is always empty.
2. A relative path has neither drive nor root.
3. A POSIX path is absolute if it has a root. A Windows path is
absolute if it has both a drive and a root. A Windows UNC path
(e.g.\\host\share\myfile.txt) always has a drive and a root (here,
\\host\share and \, respectively).
4. A path which has either a drive or a root is said to be anchored.
Its anchor is the concatenation of the drive and root. Under POSIX,
"anchored" is the same as "absolute".
"""

Good decomposition and problem overview, but hardly a solution or a
"correct" representation as I see it.

All terminology above can be reduced to just two cross-platform terms:
"mount point" and "path". "path" is always relative to "mount point".
Either can be missing. "mount point" is system-dependent.

1. All paths may have the mount point.
2. All paths without mount point are relative.
3. Default mount point for POSIX to make path absolute is '/'. Default
mount point on Windows is current drive (e.g. 'c:/'), or UNC server
address (e.g.\\host\).
4. Any (absolute?) path may be the mount point itself
5. path without mount point is called "relative"


I don't know that should be API for that, but I'd be interesting to try it.

One of the reasons I want to do this terminology is that semantically
I do work more with URL paths than with file system paths and I don't
see difference between them. When I move application from www.com site
root to some www.com/endpoint, my app doesn't stop working, because it
is written to work with any www.com/endpoint - not just with absolute
paths that point to site root.

I think that is the value that Python can provide to help build apps that are
architecturally more "correct" and system-independent.
--
anatoly t.


More information about the Python-ideas mailing list