PEP on path module for standard library
Michael Hoffman
cam.ac.uk at mh391.invalid
Thu Jul 21 18:16:06 EDT 2005
Reinhold Birkenfeld wrote:
> Michael Hoffman wrote:
>
>>Having path descend from str/unicode is extremely useful since I can
>>then pass a path object to any function someone else wrote without
>>having to worry about whether they were checking for basestring. I think
>>there is a widely used pattern of accepting either a basestring[1] or a
>>file-like object as a function argument, and using isinstance() to
>>figure out which it is.
>
> Where do you see that pattern? IIRC it's not in the stdlib.
I do not think it is a *good* pattern, but it is used in Biopython. Of
course, there they ARE using things like type("") so on a unicode
filesystem it would already break. I seem to recall seeing it elsewhere,
but I can't remember where.
If you remove the basestring superclass, then you remove the ability to
use path objects as a drop-in replacement for any path string right now.
You will either have to use str(pathobj) or carefully check that the
function/framework you are passing the path to does not use isinstance()
or any of the string methods that are now gone.
>>What do you gain from removing these methods? A smaller dir()?
>
> It made sense to me at the time I changed this, although at the moment
> I can't exactly recall the reasons.
>
> Probably as Terry said: a path is both a list and a string.
I can see the case for thinking of it in both of those ways. In the end
a path is a sequence object. But a sequence of what?
I have a path that looks like this:
r"c:\windows\system32:altstream\test.dir\myfile.txt.zip:altstream"
One way to divide this is solely based on path separators:
['c:', 'windows', 'system32:altstream', 'test.dir',
'myfile.txt.zip:altstream']
But then some of the elements of this sequence have more meaning than
just being strings. "c:" is certainly something different from
"windows." The file name and alternate data stream name of each element
could be represented as a tuple.
The extensions can also be dealt with as a sequence. I have dealt with
things like filename = "filename.x.y.z" and wanted to get "filename.x"
before. The current stdlib solution,
os.path.splitext(os.path.splitext(filename)[0])[0] is extremely clunky,
and I have long desired something better. (OK, using
filename.split(os.extsep) works a little better, but you get the idea.)
So if you start breaking the path into a sequence of bigger items than
single character, where does it stop? What is a good design for this?
--
Michael Hoffman
More information about the Python-list
mailing list