[Python-ideas] tweaking the file system path protocol

Koos Zevenhoven k7hoven at gmail.com
Fri May 26 08:58:23 EDT 2017


On Wed, May 24, 2017 at 5:52 PM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
>>
>>
>> It would be annoying and inconsistent if int(x) avoided calling __int__
>> on int subclasses. But that's exactly what happens with fspath and str.
>> I see that as a bug, not a feature: I find it hard to believe that we
>> would design an interface for string-like objects (paths) and then
>> intentionally prohibit it from applying to strings.
>>
>> And if we did, surely its a misfeature. Why *shouldn't* subclasses of
>> str get the same opportunity to customize the result of __fspath__ as
>> they get to customize their __repr__ and __str__?
>>
>> py> class MyStr(str):
>> ...     def __repr__(self):
>> ...             return 'repr'
>> ...     def __str__(self):
>> ...             return 'str'
>> ...
>> py> s = MyStr('abcdef')
>> py> repr(s)
>> 'repr'
>> py> str(s)
>> 'str'
>>
>
> This is almost exactly what I have been thinking (just that I couldn't have
> presented it so clearly)!

Unfortunately, this thinking is also very shallow compared to what
went into PEP519.

>
> Lets look at a potential usecase for this. Assume that in a package you want
> to handle several paths to different files and directories that are all
> located in a common package-specific parent directory. Then using the path
> protocol you could write this:
>
> class PackageBase (object):
>     basepath = '/home/.package'
>
> class PackagePath (str, PackageBase):
>     def __fspath__ ():
>         return os.path.join(self.basepath, str(self))
>
> config_file = PackagePath('.config')
> log_file = PackagePath('events.log')
> data_dir = PackagePath('data')
>
> with open(log_file) as log:
>     log.write('package paths initialized.\n')
>

This is exactly the kind of code that causes the problems. It will do
the wrong thing when code like open(str(log_file), 'w') is used for
compatiblity.

> Just that this wouldn't currently work because PackagePath inherits from
> str. Of course, there are other ways to achieve the above, but when you
> think about designing a Path-like object class str is just a pretty
> attractive base class to start from.

Isn't it great that it doesn't work, so it's not attractive anymore?

> Now lets look at compatibility of a class like PackagePath under this
> proposal:
>
> - if client code uses e.g. str(config_file) and proceeds to treat the
> resulting object as a path unexpected things will happen and, yes, that's
> bad. However, this is no different from any other Path-like object for which
> __str__ and __fspath__ don't define the same return value.
>

Yes, this is another way of shooting yourself in the foot. Luckily,
this one is probably less attractive.

> - if client code uses the PEP-recommended backwards-compatible way of
> dealing with paths,
>
> path.__fspath__() if hasattr(path, "__fspath__") else path
>
> things will just work. Interstingly, this would *currently* produce an
> unexpected result namely that it would execute the__fspath__ method of the
> str-subclass
>

So people not testing for 3.6+ might think their code works while it
doesn't. Luckily people not testing with 3.6+ are perhaps unlikely to
try funny tricks with __fspath__.

> - if client code uses instances of PackagePath as paths directly then in
> Python3.6 and below that would lead to unintended outcome, while in
> Python3.7 things would work. This is *really* bad.
>
> But what it means is that, under the proposal, using a str or bytes subclass
> with an __fspath__ method defined makes your code backwards-incompatible and
> the solution would be not to use such a class if you want to be
> backwards-compatible (and that should get documented somewhere). This
> restriction, of course, limits the usefulness of the proposal in the near
> future, but that disadvantage will vanish over time. In 5 years, not
> supporting Python3.6 anymore maybe won't be a big deal anymore (for
> comparison, Python3.2 was released 6 years ago and since last years pip is
> no longer supporting it). As Steven pointed out the proposal is *very*
> unlikely to break existing code.
>
> So to summarize, the proposal
>
> - avoids an up-front isinstance check in the protocol and thereby speeds up
> the processing of exact strings and bytes and of anything that follows the
> path protocol.*

Speedup for things with __fspath__ is the only virtue of this
proposal, and it has not been shown that that speedup matters
anywhere.

> - slows down the processing of instances of regular str and bytes
> subclasses*
>
> - makes the "path.__fspath__() if hasattr(path, "__fspath__") else path"
> idiom consistent for subclasses of str and bytes that define __fspath__
>

One can discuss whether this is the best idiom to use (I did not write
it, so maybe someone else has comments).

Anyway, some may want to use

path.__fspath__() if hasattr(path, "__fspath__") else str(path)

and some may want

path if isinstance(path, (str, bytes)) else path.__fspath__()

Or others may not be after oneliners like this and instead include the
full implementation of fspath in their code—or even better, with some
modifications.

Really, the best thing to use in pre-3.6 might be more like:

def fspath(path):
    if isinstance(path, (str, bytes)):
        return path
    if hasattr(path, '__fspath__'):
        return path.__fspath__()
    if type(path).__name__ == 'DirEntry':
 or isinstance(path, pathlib.PurePath):
        return str(path)
    raise TypeError("Argument cannot be interpreted as a file system
path: " + repr(path))

Note that

> - opens up the opportunity to write str/bytes subclasses that represent a
> path other than just their self in the future**
>
> Still sounds like a net win to me, but lets see what I forgot ...
>
> * yes, speed is typically not your primary concern when it comes to IO;
> what's often neglected though is that not all path operations have to
> trigger actual IO (things in os.path for example don't typically perform IO)
>
> ** somebody on the list (I guess it was Koos?) mentioned that such classes
> would only make sense if Python ever disallowed the use of str/bytes as
> paths, but I don't think that is a prerequisite here.
>

Yes, I wrote that, and I stick with it: str and bytes subclasses that
return something different from the str/bytes content should not be
written. If Python ever disallows str/bytes as paths, such a thing
becomes less harmful, and there is no need to have special treatment
for str and bytes. Until then, I'm very happy with the decision to
ignore __fspath__ on str and bytes.

—Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +


More information about the Python-ideas mailing list