[path-PEP] Path inherits from basestring again

Reinhold Birkenfeld reinhold-birkenfeld-nospam at wolke7.net
Sat Jul 23 17:05:34 EDT 2005


Peter Hansen wrote:
> Reinhold Birkenfeld wrote:
>> Peter Hansen wrote (on Paths not allowing comparison with strings):
>>>Could you please expand on what this means?  Are you referring to doing 
>>>< and >= type operations on Paths and strings, or == and != or all those 
>>>or something else entirely?
>> 
>> All of these. Do you need them?
> 
> I believe so.  If they are going to be basestring subclasses, why should 
> they be restricted in any particular way?  I suppose that if you wanted 
> to compare a Path to a string, you could just wrap the string in a Path 
> first, but if the Path is already a basestring subclass, why make 
> someone jump through that particular hoop?

Do you have a use case for the comparison? Paths should be compared only
with other paths.

>>>>Other minor differences, as requested on python-dev, are:
>>>>
>>>>* size property -> getsize() method.
>>>>* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods
>>>
>>>What does this mean?  The .size property and a getsize() method both 
>>>already exist (in my copy of path.py anyway) and do the same thing. 
>>>Same with the other ones mentioned above.  Is someone working from an 
>>>out-of-date copy of path.py?
>> 
>> No. But the size of a file is somewhat volatile, and does not feel like
>> a "property" of the path to it. Remember: the path is not the file. Same
>> goes with the xtime() methods.
> 
> Oh, so your original text was meant to imply that those properties *were 
> being removed*.  That wasn't at all clear to me.
> 
> I understand the reasoning, but I'm unsure I agree with it.  I fully 
> accept that the path is not the file, and yet I have a feeling this is a 
> pedanticism: most of the time when one is dealing with the _file_ one is 
> concerned with the content, and not much else.  When one is dealing with 
> the _path_ one often wants to check the size, the modification time, and 
> so forth.  For example, once one has the file open, one very rarely is 
> interested in when it was last modified.

My line of thought is that a path may, but does not need to refer to an
existing, metadata-readable file. For this, I think a property is not
proper.

> In other words, I feel once again that Jason's original intuition here 
> was excellent, and that he chose practicality over purity in appropriate 
> ways, in a very Pythonic fashion.  I confess to feeling that the 
> suggested changes are being proposed by those who have never actually 
> tried to put path.py to use in practical code, though I'm sure that's 
> not the case for everyone making those suggestions.
> 
> Still, once again this doesn't seem a critical issue to me and I'm happy 
> with either approach, if it means Path gets accepted in the stdlib.
> 
>> At the moment, I think about overriding certain string methods that make
>> absolutely no sense on a path and raising an exception from them.
> 
> That would seem reasonable.  It seems best to be very tolerant about 
> what "makes no sense", though istitle() would surely be one of those to 
> go first.  Also capitalize() (in spite of what Windows Explorer seems to 
> do sometimes), center(), expandtabs(), ljust(), rjust(), splitlines(), 
> title(), and zfill().  Hmm... maybe not zfill() actually.  I could 
> imagine an actual (if rare) use for that.

I'll look into it. What about iteration and indexing? Should it support
"for element in path" or "for char in path" or nothing?

>>>.bytes() and friends have felt quite 
>>>friendly in actual use, and I suspect .read_file_bytes() will feel quite 
>>>unwieldy.  Not a show-stopper however.
>> 
>> It has even been suggested to throw them out, as they don't have so much to
>> do with a path per se. When the interface is too burdened, we'll have less
>> chance to be accepted. Renaming these makes clear that they are not operations
>> on the path, but on a file the path points to.
> 
> Here again I would claim the "practicality over purity" argument.  When 
> one has a Path, it is very frequently because one intends to open a file 
> object using it and do reads and writes (obviously).  Also very often, 
> the type of reading and writing one wants to do is an "all at once" type 
> of thing, as those methods support.  They're merely a convenience, to 
> save one doing the Path(xxx).open('rb').read thing when one can merely 
> do Path(xxx).bytes(), in much the same way that the whole justification 
> for Path() is that it bundles useful and commonly used operations 
> together into one place.
> 
>> Phillip J. Eby suggested these to be set_file_xxx and get_file_xxx to demonstrate
>> that they do not read or write a stream; how about that?
> 
> If they are there, they do exactly what they do, don't they?  And they 
> do file.read() and file.write() operations, with slight nuances in the 
> mode passed to open() or the way the data is manipulated.  Why would one 
> want to hide that, making it even harder to tie these operations 
> together with what is really going on under the covers?  I think the 
> existing names, or at least ones with _read_ and _write_ in them 
> somewhere are better than set/get alternatives.  It's just rare in 
> Python to encounter names quite as cumbersome as _write_file_bytes().

I think it is not exactly bad that these names are somehow outstanding,
as that demonstrates that something complex and special happens.

> It might be good for those involved to discuss and agree on the 
> philosophy/principles behind using Path in the first place.  If it's one 
> of pragmatism, then the arguments in favour of strictly differentiating 
> between path- and file- related operations should probably not be given 
> as much weight as those in favour of simple and convenient access to 
> commonly needed functionality.  If, on the other hand, Path is seen as 
> some kind of a Java-esque universal path object which is cleanly and 
> tightly decoupled from everything else, then it would probably be best 
> to eliminate things like .getsize() and .read_file_bytes()/.bytes() 
> entirely and leave those in the hands of the cleanly defined and tightly 
> decoupled File object (currently spelled "file"?), again in a Java-esque 
> fashion.  IMHO. :-)

Hm. No, that's not my intention either. I think that path as it is is already
very good. The PEP must follow, and stress this point.

> (I'll like to say for the record that I feel that just about *any* form 
> of Path with even just the basics, basestring-based or not, would be a 
> huge improvement over the status quo, and I'm not trying to make a big 
> war out of this.  Just offering my own view as a recent (a month or two 
> ago) but very enthusiastic convert to path.py.)

That's a basis we can build on. ;)

Reinhold



More information about the Python-list mailing list