[Python-3000] Mini Path object

Mike Orr sluggoster at gmail.com
Mon Nov 6 23:37:01 CET 2006


My latest idea is something like this:

#### BEGIN
class Path(unicode):
    """Pathname-manipulation methods."""
    pathlib = os.path              # Subclass can specify (posix|nt|mac)path.
    safe_args_only = False    # Glyph can set this to True in a subclass.

class FSPath(object):
    """Filesystem-access methods.

         The constructor takes a path arg and sets self.path to it.  It also
         accepts any other combination of positional args, which are passed
         to the path constructor to create the path.
    """
    path_class = Path    # Subclass can specify an alternate path class.

    def __init__(klass, *args):
        if len(args) == 1 and isinstance(args[0], klass.path_class):
            self.path = args[0]
        else:
            self.path = self.path_class(*args)

    @classmethod
    def cwd(klass):
        """Convenience method for this common operation."""
        path = klass.path_klass.cwd()
        rerutn klass(path)
#### END

This should be versatile enough to handle several of the alternatives
that have been proposed.  Those who want Path and FSPath separate have
it.  Those who want them together can use FSPath and FSPath.path.
FSPath can itself be broken into two levels: a "medium" level that
minimally expresses the os.*, os.path.*, and shutil.*
filesystem-access functions, and an "enhanced" level that has all my
favorite methods. Those who think one or another level is evil can use
just the lower levels.

This would satisfy those (like me) who want something we can use now
in our Python 2.4/2.5 programs, and there will be working code useful
for lobbying that one or more levels should be adopted into the
stdlib.  If Path contains minimal enhancements, it would have the best
chance.

Subclassing unicode would be the simplest implementation.  The PEP 355
implementation does a unicode-or-str dance in case
os.path.supports_unicode_filenames is false.  Is it really necessary
to support this nowadays?  I'd like to promote all str's to unicode in
the constructor so that any possible UnicodeDecodeErrors are localized
in one place.


On 11/5/06, Talin <talin at acm.org> wrote:
> Mike Orr wrote:
> >     Path(  Path("directory"),   "subdirectory", "file")    # Replaces
> > .joinpath().
>
> For the constructor, I would write it as:
>
>    Path( *components )

Yes.  I was just trying to show what the argument objects where.

> Strings can also be wrapped with an object that indicates that the Path
> is in a platform- or application-specific format:
>
>     # Explicitly indicate that the path string is in Windows NTFS format.
>     Path( Path.format.NTFS( "C:\\Program Files" ) )

This (and all your other .format examples) sounds a bit complicated.
The further we stray from the syntax/semantics of the existing stdlib
modules, the harder it will be to achieve consensus, and thus the less
chance we'll get anything into the stdlib.  So I'm for a scalable
solution, where the lower levels are less controversial and well
tested, and we can experiment in the higher levels.  Can you make your
.format proposal into an optional high-level component and we'll see
how well it works in practice?

> One question to be asked is whether the path should be simplified or
> not. There are cases where you *don't* want the path to be simplified,
> and other cases where you do. Perhaps a keyword argument?
>
>     Path( "C:\\Program Files", "../../Gimp", normalize = True )

Maybe.  I'm inclined to let an .only_safe_args attribute or a SafePath
subclass enforce normalizing, and let the main class do whatever
os.path.join() does.

> >     Path("ab") + "c"  => Path("abc")
>
> Wouldn't that be:
>
>     Path( "ab" ) + "c" => Path( "ab", "c" )

If we want string compatibility we can't redefine the '+' operator.
If we ditch string compatibility we can't pass Paths to functions
expecting a string.  We can't have it both ways.  This also applies to
character slicing vs component slicing.

> >     .abspath()
>
> I've always thought this was a strange function. To be honest, I'd
> rather explicitly pass in the cwd().

I use it; it's convenient.  The method name could be improved.

> >     .normcase()
> >     .normpath()

... and other methods.  Your proposals are all in the context of "how
much do we want to improve os.path's syntax/semantics vs keeping them
as-is?"  I would certainly like improvement, but improvement works
against consensus. Some people don't want anything to change beyond a
minimal class wrapper.  Others want improvements but have differing
views about what the best "improvements" are.  So how far do you want
to go, and how does this impact your original question, "Is consensus
possible?"

Then, if consensus is not possible, what do we do?  Each go into our
corners and make our favorite incompatible module?  Or can we come up
with a master plan containing alternatives, to bring at least some
unity to the differing modules without cramping anybody's style.

In this vein, a common utility module with back-end functions would be
good.  Then we can solve the difficult problems *once* and have a test
suite that proves it, and people would have confidence using any OO
classes that are built over them.  We can start by gathering the
existing os.*, os.path.*, and shutil.* functions, and then add
whatever other functions our various OO classes might need.

However, due to the problem of supporting (posix|nt|mac)path, we may
need to express this as a class of classmethods rather than a set of
functions, so they can be defined relative to a platform library.

> >     .realpath()
>
> Rename to resolve() or resolvelinks().

Good idea.

> >     .expanduser()
> >     .expandvars()
> >     .expand()
>
> Replace with expand( user=True, vars=True )

Perhaps.  There was one guy in the discussion about Noam's path module
who didn't like .expand() at all; he thought it did too many things
implicitly and was thus too magical.

> >     .parent
>
> If parent was a function, you could pass in the number of levels go to
> up, i.e. parent( 2 ) to get the grandparent.

I'd like .ancestor(N) for that.  Parent as a property is nice when
it's only one or two levels.

>
> >     .name                 # Full filename without path
> >     .namebase        # Filename without extension
>
> I find the term 'name' ambiguous. How about:
>
>      .filepart
>      .basepart
>
> or:
>
>      .filename
>      .basename

.name/.namebase isn't great, but nothing else that's been proposed is better.

> >     .drive
>
> Do we need to somehow unify the concept of 'drive' and 'unc' part? Maybe
> '.device' could return the part before the first directory name.

This gets into the "root object" in Noam's proposal.  I'd say just
read that and the discussion, and see if it approaches what you want.
I find this another complicated and potential bog-down point, like
.format.

http://wiki.python.org/moin/AlternativePathClass
http://wiki.python.org/moin/AlternativePathDiscussion

> >     .splitpath()
>
> I'd like to replace this with:
>
>     .component( slice_object )
>
> where the semantics of 'component' are identical to __getitem__ on an
> array or tuple. So for example:
>
>     Path( "a", "b" ).component( 0 ) => "a"
>     Path( "a", "b" ).component( 1 ) => "b"
>     Path( "a", "b" ).component( -1 ) => "b"
>     Path( "a", "b" ).component( 0:1 ) => Path( "a", "b" )
>     Path( "a", "b" ).component( 1: ) => Path( "b" )
>
> This is essentially the same as the "slice notation" proposal given
> earlier, except that explicitly tell the user that we are dealing with
> path components, not characters.

    Path("a/b").components[0:1] => Path("a/b")

Is there a problem with .component returning a Path instead of a list
of components?

In some ways I still like Noam's Path-as-components idea.  It
eliminates all slicing methods, and '+' does join.  The problem is
you'd have to explicitly unicode() it when passing it to functions
that expect a string. I guess the advantage of Path being unicode
still outweigh the disadvantages.

Here's one possibility for splitting the absolute/relative part of a path:

    Path("/a/b").absolute_prefix => "/"
    relative_start_point = len(Path("/a/b").absolute_prefix)

It would have to be defined for all platforms.  Or we can have a
splitroot method:

    Path("/a/b").splitroot()  =>  [Path("/"), Path("a/b")]

Not sure which way would be most useful overall.


> >     .stripext()
>
> How about:
>
>      path.ext = ''

The discussion in Noam's proposal has .add_exts(".tar", ".gz") and
.del_exts(N).  Remember that any component can have extension(s), not
just the last.  Also, it's up to the user which apparent extensions
should be considered extensions.  How many extensions does
"python-2.4.5-i386.2006-12-12.orig.tar.gz" have?

> >     .splitall()
>
> Something sadly lacking in os.path.

I thought this was what .splitpath() would do.

> >     .relpathto()
>
> Not sure what this does, since there's no argument defined.

>From Orendorff's commentary.
"The method p1.relpathto(p2) returns a relative path to p2, starting from p1."
http://www.jorendorff.com/articles/python/path/

I've always found it confusing to remember which one is 'from' and
which one is 'to'?

-- 
Mike Orr <sluggoster at gmail.com>


More information about the Python-3000 mailing list