PEP on path module for standard library

Thu Jul 21 18:16:06 EDT 2005

Reinhold Birkenfeld wrote:
> Michael Hoffman wrote:
> 
>>Having path descend from str/unicode is extremely useful since I can 
>>then pass a path object to any function someone else wrote without 
>>having to worry about whether they were checking for basestring. I think 
>>there is a widely used pattern of accepting either a basestring[1] or a 
>>file-like object as a function argument, and using isinstance() to 
>>figure out which it is.
> 
> Where do you see that pattern? IIRC it's not in the stdlib.

I do not think it is a *good* pattern, but it is used in Biopython. Of 
course, there they ARE using things like type("") so on a unicode 
filesystem it would already break. I seem to recall seeing it elsewhere, 
but I can't remember where.

If you remove the basestring superclass, then you remove the ability to 
use path objects as a drop-in replacement for any path string right now. 
You will either have to use str(pathobj) or carefully check that the 
function/framework you are passing the path to does not use isinstance() 
or any of the string methods that are now gone.

>>What do you gain from removing these methods? A smaller dir()?
> 
> It made sense to me at the time I changed this, although at the moment
> I can't exactly recall the reasons.
> 
> Probably as Terry said: a path is both a list and a string.

I can see the case for thinking of it in both of those ways. In the end 
a path is a sequence object. But a sequence of what?

I have a path that looks like this:

r"c:\windows\system32:altstream\test.dir\myfile.txt.zip:altstream"

One way to divide this is solely based on path separators:

['c:', 'windows', 'system32:altstream', 'test.dir', 
'myfile.txt.zip:altstream']

But then some of the elements of this sequence have more meaning than 
just being strings. "c:" is certainly something different from 
"windows." The file name and alternate data stream name of each element 
could be represented as a tuple.

The extensions can also be dealt with as a sequence. I have dealt with 
things like filename = "filename.x.y.z" and wanted to get "filename.x" 
before. The current stdlib solution, 
os.path.splitext(os.path.splitext(filename)[0])[0] is extremely clunky, 
and I have long desired something better. (OK, using 
filename.split(os.extsep) works a little better, but you get the idea.)

So if you start breaking the path into a sequence of bigger items than 
single character, where does it stop? What is a good design for this?
-- 
Michael Hoffman