[Python-3000] Mini Path object
Talin
talin at acm.org
Mon Nov 6 08:39:50 CET 2006
Mike Orr wrote:
> Posted to python-dev and python-3000. Follow-ups to python-dev only please.
>
> So, let's say we strip this Path class to:
I'm finally taking the time to sit down and go over this in detail. Here
are some suggestions.
> class Path(unicode):
> Path("foo")
> Path( Path("directory"), "subdirectory", "file") # Replaces
> .joinpath().
> Path()
For the constructor, I would write it as:
Path( *components )
'components' is an arbitrary number of path components, either strings
or path objects. The components are joined together into a single path.
Strings can also be wrapped with an object that indicates that the Path
is in a platform- or application-specific format:
# Explicitly indicate that the path string is in Windows NTFS format.
Path( Path.format.NTFS( "C:\\Program Files" ) )
Note that it's OK to include path separators in the string that's passed
to the format wrapper - these will get converted.
Not including a format wrapper is equivalent to using the "local" wrapper:
Path( Path.format.local( "C:\\Program Files" ) )
Where 'local' is an alias to the native path format for the host's
default filesystem.
The wrapper objects are themselves classes, and need not be in the
"Path" namespace. For example:
import p4
Path( p4.path.format( "//depot/files/..." ) )
This makes the set of specific path formats open-ended and extensible.
Path format wrappers need not be built into the "Path" module. Each
format wrapper will have a "to_path" method, that converts the specific
path encoding into the universal path representation.
Note that if there are multiple components, they don't have to be
wrapped the same way:
Path( Path.format.NTFS( "C:\\Program Files" ),
Path.format.local( "Gimp" ) )
...because the conversion to universal representation is done before the
components are combined.
One question to be asked is whether the path should be simplified or
not. There are cases where you *don't* want the path to be simplified,
and other cases where you do. Perhaps a keyword argument?
Path( "C:\\Program Files", "../../Gimp", normalize = True )
> Path.cwd()
No objection here.
> Path("ab") + "c" => Path("abc")
Wouldn't that be:
Path( "ab" ) + "c" => Path( "ab", "c" )
?
It seems that the most common operation is concatenating components, not
characters, although both should be easy.
> .abspath()
I've always thought this was a strange function. To be honest, I'd
rather explicitly pass in the cwd().
> .normcase()
So the purpose of this function is to get around the fact that on some
platforms, comparisons between paths are case-sensitive, and on other
platforms not. However, the reason this function seems weird to me is
that most case-insensitive filesystems are case-preserving, which makes
me thing that the real solution is to fix the comparison functions
rather than mangling the string. (Although there's a hitch - its hard to
make a case-insensitive dictionary that doesn't require a downcase'd
copy of the key; Something I've long wanted was a userdict that allowed
both the comparison and hash functions to be replaceable, but that's a
different topic.)
> .normpath()
I'd rename this to "simplify", since it no longer needs to normalize the
separator chars. (That's done by the wrappers.)
> .realpath()
Rename to resolve() or resolvelinks().
> .expanduser()
> .expandvars()
> .expand()
Replace with expand( user=True, vars=True )
> .parent
If parent was a function, you could pass in the number of levels go to
up, i.e. parent( 2 ) to get the grandparent.
> .name # Full filename without path
> .namebase # Filename without extension
I find the term 'name' ambiguous. How about:
.filepart
.basepart
or:
.filename
.basename
> .ext
No problem with this
> .drive
Do we need to somehow unify the concept of 'drive' and 'unc' part? Maybe
'.device' could return the part before the first directory name.
> .splitpath()
I'd like to replace this with:
.component( slice_object )
where the semantics of 'component' are identical to __getitem__ on an
array or tuple. So for example:
Path( "a", "b" ).component( 0 ) => "a"
Path( "a", "b" ).component( 1 ) => "b"
Path( "a", "b" ).component( -1 ) => "b"
Path( "a", "b" ).component( 0:1 ) => Path( "a", "b" )
Path( "a", "b" ).component( 1: ) => Path( "b" )
This is essentially the same as the "slice notation" proposal given
earlier, except that explicitly tell the user that we are dealing with
path components, not characters.
> .stripext()
How about:
path.ext = ''
> .splitunc()
> .uncshare
See above - UNC shouldn't be a special case.
> .splitall()
Something sadly lacking in os.path.
> .relpath()
Again, I'd rather that they pass in the cwd() explicitly. But I would
like to see something like:
.relativeto( path )
...which computes the minimal relative path that goes from 'self' to 'path'.
> .relpathto()
Not sure what this does, since there's no argument defined.
Additional methods:
.format( wrapper_class )
...converts the path into a filesystem-specific format. You can also get
the same effect by "wrapping" the path object and calling str()
str( Path.format.NTFS( Path( "a", "b", "c" ) ) )
Although it's a bit cumbersome.
-- Talin
More information about the Python-3000
mailing list