[Python-3000] Path Reform: Get the ball rolling

Mike Orr sluggoster at gmail.com
Wed Nov 1 21:47:12 CET 2006


The thread on python-dev has been revived, so those interested in this
subject will want to look in both places.

On 11/1/06, Talin <talin at acm.org> wrote:
> Actually I generally use:
>
>        p = os.path.normpath( os.path.join( __file__, "../..", "lib" ) )
>
> or even:
>
>        p = os.path.normpath( os.path.join( __file__, "../../lib" ) )
>
> ...which isn't quite as concise as what you wrote, but is better than
> the first example. (The reason this works is because 'normpath' doesn't
> know whether the last component is a file or a directory -- it simply
> interprets the ".." as an instruction to strip off the last component.)

This illustrates two problems with os.path.  The reason I use all
these nested functions instead of a simple normpath/join is one is
told "it's bad to use platform-specific separators".  Perhaps this
disrecommendation should be lifted, especially since both Mac and
Windows do the right thing with "/", "..", and "." now.

The other thing is, ".." and "." seem to be smarter than
os.path.dirname.  I can't quite articulate the rules but '.' off a
file chops the filename component, while '.' off a directory does
nothing.  '..' off a file goes to the file's directory, while '..' off
a directory goes to the directory's parent.  Dirname() just chops the
final component without asking what it is, while '..' and '.' do
different things depending on whether the final component is a
directory.  I think a method like .ancestor(N) would be useful,
meaning "do '..' N times.

/a/b/../c    # Previous component is always a directory, so eliminate.
/a/b/./c     # Swallow the '.'.
/a/directory/..

> What I'd like to see is a version of "join" that automatically
> simplifies as it goes. Lets call it "combine":
>
>        p = os.path.combine( __file__, "../..", "lib" )
>
> or:
>
>        p = os.path.combine( __file__, "../../lib" )
>
> That's even easier to read than any of the above versions IMHO.

I wouldn't mind that actually.  But the feedback I've gotten is the
fewer variations from os.path functions, the better. I disagree with
that though.


[In Noam Raphael's propsal:]
> -- Is path[ 0 ] a string or a path? What if I really do want to get the
> first two *characters* of the path, and not the first to components? Do
> I have to say something like:
>
>     str( path )[ :2 ]

That's something we've gone back and forth on: how to add characters
to a path, how to add an extension, etc.  We decided component-slicing
was too important to give up.  str(p) or unicode(p) of any Path will
give the string representation.  Converting to a string and back may
not be the most elegant thing in the world but it's more
straightforward than having special methods for character slicing.  I
have also proposed

p[0] is a special "root object", which may be '', '/', 'c:\', 'c:',
'\\abc' depending on the platform.  So when joining there's no
separator before the next component.

p[1:] individually are each a subclass of unicode, with extra methods
to extract basename and extension, delete N extensions from the end,
add N extensions, etc.

Any slice of a path object is a new path object.  If the root is
chopped off it becomes a relative path.

> I would argue that both paths and query strings are passive, whereas
> tables and file systems are, if not exactly lively, at least more
> 'actor-like' than paths or queries.

I can see your point. The only reason I went with a "monolithic" OO
class is because that's what all the proposals have been for the past
three years until last month, and I didn't think another way was
possible or desirable.

-- 
Mike Orr <sluggoster at gmail.com>


More information about the Python-3000 mailing list