PEP on path module for standard library

Andrew Dalke dalke at dalkescientific.com
Fri Jul 22 13:08:07 EDT 2005


Duncan Booth wrote:
> Personally I think the concept of a specific path type is a good one, but 
> subclassing string just cries out to me as the wrong thing to do.

I disagree.  I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input.  There is no
automatic coercion.

>>> class Spam:
...   def __getattr__(self, name):
...     print "Want", repr(name)
...     raise AttributeError, name
... 
>>> open(Spam())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
>>> import os
>>> os.mkdir(Spam())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
>>> 

The solutions to this are:
  1) make the path object be derived from str or unicode.  Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

  2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like


  if isinstance(input, basestring):
    input = open(input, "rU")
  for line in input:
    print line

I showed several places in the stdlib and in 3rd party packages
where this is used.


> In other words, to me a path represents something in a filesystem,

Being picky - or something that could be in a filesystem.

> the fact that it 
> has one, or indeed several string representations does not mean that the 
> path itself is simply a more specific type of string.

I didn't follow this.

> You should need an explicit call to convert a path to a string and that 
> forces you when passing the path to something that requires a string to 
> think whether you wanted the string relative, absolute, UNC, uri etc.

You are broadening the definition of a file path to include URIs?
That's making life more complicated.  Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:dalke at example.com" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths.  What should os.listdir("http://www.python.org/") do?

As I mentioned, I tried some classes which emulated file
paths.  One was something like

class TempDir:
  """removes the directory when the refcount goes to 0"""
  def __init__(self):
    self.filename = ... use a function from the tempfile module
  def __del__(self):
    if os.path.exists(self.filename):
      shutil.rmtree(self.filename)
  def __str__(self):
    return self.filename

I could do

  dirname = TempDir()

but then instead of

  os.mkdir(dirname)
  tmpfile = os.path.join(dirname, "blah.txt")

I needed to write it as

  os.mkdir(str(dirname))
  tmpfile = os.path.join(str(dirname), "blah.txt"))

or have two variables, one which could delete the
directory and the other for the name.  I didn't think
that was good design.


If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not.  As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time.  My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.

> It may even be that we need a hierarchy of path
> classes: URLs need similar but not identical manipulations
> to file paths, so if we want to address the failings
> of os.path perhaps we should also look at the failings 
> of urlparse at the same time.

I've found that hierarchies are rarely useful compared
to the number of times they are proposed and used.  One
of the joys to me of Python is its deemphasis of class
hierarchies.

I think the same is true here.  File paths and URIs are
sufficiently different that there are only a few bits
of commonality between them.  Consider 'split' which
for files creates (dirname, filename) while for urls
it creates (scheme, netloc, path, query, fragment)

				Andrew
				dalke at dalkescientific.com




More information about the Python-list mailing list