PEP on path module for standard library
Reinhold Birkenfeld
reinhold-birkenfeld-nospam at wolke7.net
Fri Jul 22 15:38:09 EDT 2005
Andrew Dalke wrote:
> Duncan Booth wrote:
>> Personally I think the concept of a specific path type is a good one, but
>> subclassing string just cries out to me as the wrong thing to do.
>
> I disagree. I've tried using a class which wasn't derived from
> a basestring and kept running into places where it didn't work well.
> For example, "open" and "mkdir" take strings as input. There is no
> automatic coercion.
Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.
So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.
>>>> class Spam:
> ... def __getattr__(self, name):
> ... print "Want", repr(name)
> ... raise AttributeError, name
> ...
>>>> open(Spam())
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, instance found
>>>> import os
>>>> os.mkdir(Spam())
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, instance found
>>>>
>
> The solutions to this are:
> 1) make the path object be derived from str or unicode. Doing
> this does not conflict with any OO design practice (eg, Liskov
> substitution).
>
> 2) develop a new "I represent a filename" protocol, probably done
> via adapt().
>
> I've considered the second of these but I think it's a more
> complicated solution and it won't fit well with existing APIs
> which do things like
>
>
> if isinstance(input, basestring):
> input = open(input, "rU")
> for line in input:
> print line
>
> I showed several places in the stdlib and in 3rd party packages
> where this is used.
That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.
>> You should need an explicit call to convert a path to a string and that
>> forces you when passing the path to something that requires a string to
>> think whether you wanted the string relative, absolute, UNC, uri etc.
>
> You are broadening the definition of a file path to include URIs?
> That's making life more complicated. Eg, the rules for joining
> file paths may be different than the rules for joining URIs.
> Consider if I have a file named "mail:dalke at example.com" and I
> join that with "file://home/dalke/badfiles/".
>
> Additionally, the actions done on URIs are different than on file
> paths. What should os.listdir("http://www.python.org/") do?
I agree. Path is only for local filesystem paths (well, in UNIX they could
as well be remote, but that's thanks to the abstraction the filesystem
provides, not Python).
> As I mentioned, I tried some classes which emulated file
> paths. One was something like
>
> class TempDir:
> """removes the directory when the refcount goes to 0"""
> def __init__(self):
> self.filename = ... use a function from the tempfile module
> def __del__(self):
> if os.path.exists(self.filename):
> shutil.rmtree(self.filename)
> def __str__(self):
> return self.filename
>
> I could do
>
> dirname = TempDir()
>
> but then instead of
>
> os.mkdir(dirname)
> tmpfile = os.path.join(dirname, "blah.txt")
>
> I needed to write it as
>
> os.mkdir(str(dirname))
> tmpfile = os.path.join(str(dirname), "blah.txt"))
>
> or have two variables, one which could delete the
> directory and the other for the name. I didn't think
> that was good design.
I can't follow. That's clearly not a Path but a custom object of yours.
However, I would have done it differently: provide a "name" property
for the object, and don't call the variable "dirname", which is confusing.
> If I had derived from str/unicode then things would
> have been cleaner.
>
> Please note, btw, that some filesystems are unicode
> based and others are not. As I recall, one nice thing
> about the path module is that it chooses the appropriate
> base class at import time. My "str()" example above
> does not and would fail on a Unicode filesystem aware
> Python build.
There's no difference. The only points where the type of a Path
object' underlying string is decided are Path.cwd() and the
Path constructor.
Reinhold
More information about the Python-list
mailing list