[Python-ideas] Working with Path objects: p-strings?

Brett Cannon brett at python.org
Tue Mar 29 14:51:49 EDT 2016


On Tue, 29 Mar 2016 at 02:45 Michel Desmoulin <desmoulinmichel at gmail.com>
wrote:

>
>
> Le 29/03/2016 09:50, Sven R. Kunze a écrit :
> > On 25.03.2016 22:56, Michel Desmoulin wrote:
> >> Although path.py, which I have been using for years now (and still
> >> prefer to pathlib) subclass str and never caused any problem whatsoever.
> >> So really, we should pinpoint where it could be an issue and see if this
> >> is a frequent problem, because right now, it seems more a decision based
> >> on purity than practicality.
> >
> > I agree. The PEP says:
> >
> > """
> > No confusion with builtins
> > ----------------------------------
> > In this proposal, the path classes do not derive from a builtin type.
> > This contrasts with some other Path class proposals which were derived
> > from str . They also do not pretend to implement the sequence protocol:
> > if you want a path to act as a sequence, you have to lookup a dedicated
> > attribute (the parts attribute).
> >
> > Not behaving like one of the basic builtin types also minimizes the
> > potential for confusion if a path is combined by accident with genuine
> > builtin types.
> > """"
> >
> > I have to admit I cannot follow these statements but they should have
> > appeared to be necessary back then. As experience shows the PyPI module
> > fared far better here.
> >
> >
> > I am great a fan of theory over practice as what has been proven in
> > theory cannot be proven wrong in practice. However, this only holds if
> > we talk about hard proof. For soft things like "human interaction",
> > "mutual understanding" or "potential for confusion", only the practice
> > of many many people can "prove" what's useful, what's "practical".
> >
> >
>
> +1. Can somebody defend, with practical examples, the imperative to
> refuse to inherit from str ? And weight it against the benefits of it ?
>
> We have been doing it for years, and so far it's been really, really
> nice. I can't recall problem I had because of it ever. I can recall
> numerous editions of my code because I forgot str() on pathlib.Path.
>

 I may regret this, but I will give this a shot. :)

I wrote a blog post at
http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str because my
response was getting a bit long. I have pasted it below in CommonMark.

-----

Over on [python-ideas](https://mail.python.org/mailman/listinfo/python-ideas)
a discussion has broken out about somehow trying to make
`p'/some/path/to/a/file` return an [instance of `pathlib.Path`](
https://docs.python.org/3/library/pathlib.html#pathlib.Path). This led to a
splinter discussion as to why `pathlib.Path` doesn't inherit from `str`? I
figured instead of burying my response to this question in the thread I'd
blog about it to try and explain one approach to API design.

I think the key question in all of this is whether paths are semantically
equivalent to a sequence of characters? Obviously the answer is "no" since
paths have structure, directly represent something like files and
directories, etc. Now paths do have a serialized representation as strings
-- at least most of the time, but I'm ignoring Linux and the crazy
situation of binary paths -- which is why C APIs take in `char *` as the
representation of paths (and because C tends to make one try to stick with
the types that are part of the C standard). So not all strings represent
paths, but all paths can be represented by strings. I think we can all
agree with that.

OK, so if all paths can be represented as strings, why don't we just make
`pathlib.Path` subclass `str`? Well, as I said earlier, not all strings are
paths. You can't concatenate a string representing a path with some other
random string and expect to get back a string that still represents a valid
string (and I'm not talking about "valid" as in "the file doesn't exist",
I'm talking about "syntactically not possible"). This is what [PEP 428](
https://www.python.org/dev/peps/pep-0428/) is talking about when it says:

> Not behaving like one of the basic builtin types [list str] also
minimizes the potential for confusion if a path is combined by accident
with genuine builtin types [like str].

Now at this point someone someone will start saying, "but Brett,
'[practicality beats purity](https://www.python.org/dev/peps/pep-0020/)'
and all of those pre-existing, old OS APIs want a string as an argument!"
And you're right, in terms of practicality it would be easier for
`pathlib.Path` to inherit from `str` ... for the short/medium term. One
thing people are forgetting is that "[explicit is better than implicit](
https://www.python.org/dev/peps/pep-0020/)" as well and as with most design
decisions there's not one clear answer. Implicitly treating a path as a
string currently works, but that's just because we have inherited a
suboptimal representation for paths from C. Now if you use an explicit
representation of paths like `pathlib.Path` then you gain semantic
separation and understanding of what you are working with. You also avoid
any issues that come with implicit `str` compatibility as I pointed out
earlier.

And before anyone says, "so what?", think about this: Python 2/3. While I'm
sure some will say I'm overblowing the comparison, but Python 3 came about
because the implicit compatibility of binary and textual data in Python 2
caused major headaches to the point that we made a backwards-incompatible
change that has caused widespread ramifications (but which we seem to be
coming out on the other side of). By not inheriting from `str`,
`pathlib.Path` has avoided a similar potential issue from the get-go. Or
another way of looking at it is to ask why doesn't `dict` inherit from
`str` so that it's easier to make it easier to stick in the body of an HTTP
response so it can be implicitly treated as JSON? If you going, "eww, no
because they are different types" then you at least understand the argument
I'm trying to make (if you're going "I like that idea", then I think [Perl](
https://www.perl.org/) might be more to your liking than Python and that's
fine since the two languages just have different designs).

Now back to that "practicality beats purity" side of this. Obviously there
are lots of APIs out there that take a string as an argument for a file
path. And I understand people don't like using `str(path)` or the
upcoming/new [`getattr(path, 'path', path)` idiom](
https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.path) to
work around the limitations forced upon them by other APIs that only take a
`str` because they occasionally forget it. For this point, I have two
answers. One is still "explicit is better than implicit" and you should
have tests to begin with to catch when you forget to convert your
`pathlib.Path` objects to `str` as needed. I'm just one of those
programmers who's willing to type a bit more for easy-to-read code that's
less error-prone within reason (and I obviously view this as reasonable).

But more importantly, why don't you work with the code and/or projects that
are forcing you to convert to `str` to start accepting `pathlib` objects as
well for those same APIs? If projects start to work on making `pathlib`
objects acceptable anywhere a path is accepted then once Python 3.4 is the
oldest version of Python that is supported by the project then they can
start considering dropping support for `str` as paths in new releases
(obviously this also includes dropping support for Python 2 unless people
use [pathlib2](https://pypi.python.org/pypi/pathlib2/)). And I fully admit
the stdlib is not exempt from a place that needs updating, so for
`importlib` I have [opened an issue](http://bugs.python.org/issue26667) to
update it to show this isn't just talk on my end (unfortunately there's
some unique bootstrapping problems to import where stuff like `sys.path`
probably can't be updated since that has to exist before you can import
`pathlib` and that sort of thing ripples out, but I'm hoping I can update
at least some things and this is a very unique case of potentially needing
to stick with `str`). Hopefully if people start asking for `pathlib`
support from projects they will add it, eventually leading to an
alleviation of the desire to have `pathlib.Path` inherit from `str`.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160329/4df6c04b/attachment.html>


More information about the Python-ideas mailing list