[Python-ideas] PEP 428 - object-oriented filesystem paths

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Sat Oct 6 10:39:13 CEST 2012


Antoine Pitrou writes:

 > On Fri, 5 Oct 2012 20:19:12 +0100
 > Paul Moore <p.f.moore at gmail.com> wrote:
 > > On 5 October 2012 19:25, Antoine Pitrou <solipsis at pitrou.net> wrote:
 > > > A path can be joined with another using the ``__getitem__`` operator::
 > > >
 > > >     >>> p = PurePosixPath('foo')
 > > >     >>> p['bar']
 > > >     PurePosixPath('foo/bar')
 > > >     >>> p[PurePosixPath('bar')]
 > > >     PurePosixPath('foo/bar')
 > >
 > > There is a risk that this is too "cute". However, it's probably better
 > > than overloading the '/' operator, and you do need something
 > > short.

I didn't like this much at first.  However, if you think of this as a
"collection" (cf. WebDAV), then the bracket notation is the obvious
way to do it in Python (FVO "it" == "accessing a member of a
collection by name").

I wonder if there is a need to distinguish between a path naming a
directory as a collection, and as a file itself?  Or can/should this
be implicit (wash my mouth out with soap!) in the operation using the
Path?

 > Someone else proposed overloading '+', which would be confusing
 > since we need to be able to combine paths and regular strings, for
 > ease of use.

Is it really that obnoxious to write "p + Path('bar')" (where p is a
Path)?

What about the case "'bar' + p"?  Since Python isn't C, you can't
express that as "'bar'[p]"!


 > The point of using __getitem__ is that you get an error if you replace
 > the Path object with a regular string by mistake:
 > 
 > > > As with constructing, multiple path components can be specified at once::
 > > >
 > > >     >>> p['bar/xyzzy']
 > > >     PurePosixPath('foo/bar/xyzzy')
 > > 
 > > That's risky. Are you proposing always using '/' regardless of OS? I'd
 > > have expected os.sep (so \ on Windows).
 > 
 > Both '/' and '\\' are accepted as path separators under Windows. Under
 > Unix, '\\' is a regular character:

That's outright ugly, especially from the "collections" point of view
(foo/bar/xyzzy is not a member of foo).  If you want something that
doesn't suffer from the bogosities of os.path, this kind of platform-
dependence should be avoided, I think.

 > > Also, there is no good terminology in current use here. The only
 > > concrete thing I can suggest is that "root" would be better used as
 > > the term for what you're calling "anchor" as Windows users would
 > > expect the root of "C:\foo\bar\baz" to be "C:\".
 > 
 > But then the root of "C:foo" would be "C:", which sounds wrong:
 > "C:" isn't a root at all.

Why not interpret the root of "C:foo" to be None?  The Windows user
can still get "C:" as the drive, and I don't think that will be
surprising to them.

 > > But there's no really simple answer - Windows and Unix are just
 > > different here.
 > 
 > Yes, and Unix users are expecting something simpler than what's going on
 > under Windows ;)

Well, Unix users can do things more uniformly.  But there's also a lot
of complexity going on under the hood.  Every file system has a root,
of which only one is named "/".  I don't know if Python programs ever
need that information (I never have :-), but it would be nice to leave
room for extension.  Similarly, many "file systems" are actually just
hierarchically organized database access methods with no physical
existence on hardware.

I wonder if "mount_point" is sufficiently general to include the roots
of real local file systems, remote file systems, Windows drives, and
pseudo file systems?  An obvious problem is that Windows users would
not find that terminology natural.




More information about the Python-ideas mailing list