[Python-ideas] tweaking the file system path protocol

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Tue May 23 12:53:44 EDT 2017


On 05/23/2017 06:41 PM, Wolfgang Maier wrote:
> On 05/23/2017 06:17 PM, Koos Zevenhoven wrote:
>> On Tue, May 23, 2017 at 1:12 PM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>> What do you think of this idea for a slight modification to os.fspath:
>>> the current version checks whether its arg is an instance of str, 
>>> bytes or
>>> any subclass and, if so, returns the arg unchanged. In all other 
>>> cases it
>>> tries to call the type's __fspath__ method to see if it can get str, 
>>> bytes,
>>> or a subclass thereof this way.
>>>
>>> My proposal is to change this to:
>>> 1) check whether the type of the argument is str or bytes *exactly*; 
>>> if so,
>>> return the argument unchanged
>>> 2) check wether __fspath__ can be called on the type and returns an 
>>> instance
>>> of str, bytes, or any subclass (just like in the current version)
>>> 3) check whether the type is a subclass of str or bytes and, if so, 
>>> return
>>> it unchanged
>>
> 
> Hi Koos and thanks for your detailed response,
> 
>> The reason why this was not done was that a str or bytes subclass that
>> implements __fspath__(self) would work in both pre-3.6 and 3.6+ but
>> behave differently. This would be also be incompatible with existing
>> code using str(path) for compatibility with the stdlib (the old way,
>> which people still use for pre-3.6 compatibility even in new code).
>>
> 
> I'm not sure that sounds very convincing because that exact problem 
> exists, was discussed and accepted in your PEP 519 for all other 
> classes. I do not really see why subclasses of str and bytes should 
> require special backwards compatibility here. Is there a reason why you 
> are thinking they should be treated specially?
> 

Ah, sorry, I misunderstood what you were trying to say, but now I'm 
getting it! subclasses of str and bytes were of course usable as path 
arguments before simply because they were subclasses of them. Now they 
would be picked up based on their __fspath__ method, but old versions of 
Python executing code using them would still use them directly. Have to 
think about this one a bit, but thanks for pointing it out.

>>> This would have the following implications:
>>> a) it would speed up the very common case when the arg is either a 
>>> str or a
>>> bytes instance exactly
>>
>> To get the same performance benefit for str and bytes, but without
>> changing functionality, there could first be the exact type check and
>> then the isinstance check. This would add some performance penalty for
>> PathLike objects. Removing the isinstance part of the __fspath__()
>> return value, which I find less useful, would compensate for that. (3)
>> would not be necessary in this version.
>>
> 
> Right, that was one thing I forgot to mention in my list. My proposal 
> would also speed up processing of pathlike objects because it moves the 
> __fspath__ call up in front of the isinstance check. Your alternative 
> would speed up only str and bytes, but would slow down Path-like classes.
> In addition, I'm not sure that removing the isinstance check on the 
> return value of __fspath__() is a good idea because that would mean 
> giving up the guarantee that os.fspath returns an instance of str or 
> bytes and would effectively force library code to do the isinstance 
> check anyway even if the function may have performed it already, which 
> would worsen performance further.
> 
>> Are you asking for other reasons, or because you actually have a use
>> case where this matters? If this performance really matters somewhere,
>> the version I describe above could be considered. It would have 100%
>> backwards compatibility, or a little less (99% ?) if the isinstance
>> check of the __fspath__() return value is removed for performance
>> compensation.
>>
> 
> That use case question is somewhat difficult to answer. I had this idea 
> when working on two bug tracker issues (one concerning fnmatch and a 
> follow-up one on os.path.normcase, which is called by fnmatch.filter 
> and, in turn, calls os.fspath. fnmatchfilter is a case where performance 
> matters and the decision when and where to call the rather expensive 
> os.path.normcase->os.fspath there is not entirely straightforward. So, 
> yes, I was basically looking at this because of a potential use case, 
> but I say potential because I'm far from sure that any speed gain in 
> os.fspath will be big enough to be useful for fnmatch.filter in the end.
> 
> 
>>> b) user-defined classes that inherit from str or bytes could control 
>>> their
>>> path representation just like any other class
>>
>> Again, this would cause differences in behavior between different
>> Python versions, and based on whether str(path) is used or not.
>>
>> —Koos
>>
>>


More information about the Python-ideas mailing list