[Python-ideas] Working with Path objects: p-strings?

Sven R. Kunze srkunze at mail.de
Tue Mar 29 17:00:08 EDT 2016


On 29.03.2016 20:51, Brett Cannon wrote:
> I may regret this, but I will give this a shot. :)

Never, Brett. I think everybody here appreciate a decent discussion. Who 
knows maybe, somebody advocating for path->str misses an important piece 
which in turn misses the PEP to describe clearly.

>
> I wrote a blog post at 
> http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str because 
> my response was getting a bit long. I have pasted it below in CommonMark.

Thanks +1

So, let's see:

> Over on 
> [python-ideas](https://mail.python.org/mailman/listinfo/python-ideas) 
> a discussion has broken out about somehow trying to make 
> `p'/some/path/to/a/file` return an [instance of 
> `pathlib.Path`](https://docs.python.org/3/library/pathlib.html#pathlib.Path). 
> This led to a splinter discussion as to why `pathlib.Path` doesn't 
> inherit from `str`? I figured instead of burying my response to this 
> question in the thread I'd blog about it to try and explain one 
> approach to API design.
>
> I think the key question in all of this is whether paths are 
> semantically equivalent to a sequence of characters? Obviously the 
> answer is "no" since paths have structure, directly represent 
> something like files and directories, etc. Now paths do have a 
> serialized representation as strings -- at least most of the time, but 
> I'm ignoring Linux and the crazy situation of binary paths -- which is 
> why C APIs take in `char *` as the representation of paths (and 
> because C tends to make one try to stick with the types that are part 
> of the C standard). So not all strings represent paths, but all paths 
> can be represented by strings. I think we can all agree with that.
>
> OK, so if all paths can be represented as strings, why don't we just 
> make `pathlib.Path` subclass `str`? Well, as I said earlier, not all 
> strings are paths. You can't concatenate a string representing a path 
> with some other random string and expect to get back a string that 
> still represents a valid string (and I'm not talking about "valid" as 
> in "the file doesn't exist", I'm talking about "syntactically not 
> possible"). This is what [PEP 
> 428](https://www.python.org/dev/peps/pep-0428/) is talking about when 
> it says:
>
> > Not behaving like one of the basic builtin types [list str] also 
> minimizes the potential for confusion if a path is combined by 
> accident with genuine builtin types [like str].

Valid string? I assume you mean path, right?

> Now at this point someone someone will start saying, "but Brett, 
> '[practicality beats 
> purity](https://www.python.org/dev/peps/pep-0020/)' and all of those 
> pre-existing, old OS APIs want a string as an argument!" And you're 
> right, in terms of practicality it would be easier for `pathlib.Path` 
> to inherit from `str` ... for the short/medium term. One thing people 
> are forgetting is that "[explicit is better than 
> implicit](https://www.python.org/dev/peps/pep-0020/)" as well and as 
> with most design decisions there's not one clear answer. Implicitly 
> treating a path as a string currently works, but that's just because 
> we have inherited a suboptimal representation for paths from C. Now if 
> you use an explicit representation of paths like `pathlib.Path` then 
> you gain semantic separation and understanding of what you are working 
> with. You also avoid any issues that come with implicit `str` 
> compatibility as I pointed out earlier.

I agree with the concatenation issue when using plain strings for paths.

However, something that I cannot leave uncommented is "suboptimal 
representation for paths". What would your optimal representation for 
paths look like? I cannot believe that the current representation is so 
bad and has been for so long and nobody, really nobody, has anything 
done about it.

> And before anyone says, "so what?", think about this: Python 2/3. 
> While I'm sure some will say I'm overblowing the comparison, but 
> Python 3 came about because the implicit compatibility of binary and 
> textual data in Python 2 caused major headaches to the point that we 
> made a backwards-incompatible change that has caused widespread 
> ramifications (but which we seem to be coming out on the other side 
> of). By not inheriting from `str`, `pathlib.Path` has avoided a 
> similar potential issue from the get-go. Or another way of looking at 
> it is to ask why doesn't `dict` inherit from `str` so that it's easier 
> to make it easier to stick in the body of an HTTP response so it can 
> be implicitly treated as JSON? If you going, "eww, no because they are 
> different types" then you at least understand the argument I'm trying 
> to make (if you're going "I like that idea", then I think 
> [Perl](https://www.perl.org/) might be more to your liking than Python 
> and that's fine since the two languages just have different designs).

I think most "practicality beats purity" folks don't want that either. 
They are just bloody lazy. They actually want the benefits of both, the 
pure datastructure with its convenience methods and the dirty str-like 
thing with its convenience methods.

People don't like it when somebody takes one bag of convenience away 
from them if they easily can have both. ;-)

Better sell it differently. So, let's read on. ;)

> Now back to that "practicality beats purity" side of this. Obviously 
> there are lots of APIs out there that take a string as an argument for 
> a file path. And I understand people don't like using `str(path)` or 
> the upcoming/new [`getattr(path, 'path', path)` 
> idiom](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.path) 
> to work around the limitations forced upon them by other APIs that 
> only take a `str` because they occasionally forget it. For this point, 
> I have two answers.

Sorry to interrupt your here but you missed one important piece (didn't 
we agree on selling this better?): people do nasty little string 
manipulations/analysis/regex with their paths for whatever reasons. 
Right now, it's easier than ever because, well, because paths are 
strings. :)

Selling pathlib with the "getattr(path, 'path', path)" idiom is not 
going to help those fellows. I mean, are you serious? It looks 
ridiculous. :-/

Especially the point that a "path" has a "path" is not really getting 
into my head. Either a "path" is a "path" or it better be named 
"file/directory" which has a "path".

path.path vs file.path/dir.path

You need to work on your selling skills, Brett. ;-)

> One is still "explicit is better than implicit" and you should have 
> tests to begin with to catch when you forget to convert your 
> `pathlib.Path` objects to `str` as needed. I'm just one of those 
> programmers who's willing to type a bit more for easy-to-read code 
> that's less error-prone within reason (and I obviously view this as 
> reasonable).
>
> But more importantly, why don't you work with the code and/or projects 
> that are forcing you to convert to `str` to start accepting `pathlib` 
> objects as well for those same APIs? If projects start to work on 
> making `pathlib` objects acceptable anywhere a path is accepted then 
> once Python 3.4 is the oldest version of Python that is supported by 
> the project then they can start considering dropping support for `str` 
> as paths in new releases (obviously this also includes dropping 
> support for Python 2 unless people use 
> [pathlib2](https://pypi.python.org/pypi/pathlib2/) 
> <https://pypi.python.org/pypi/pathlib2/%29>). And I fully admit the 
> stdlib is not exempt from a place that needs updating, so for 
> `importlib` I have [opened an 
> issue](http://bugs.python.org/issue26667) to update it to show this 
> isn't just talk on my end (unfortunately there's some unique 
> bootstrapping problems to import where stuff like `sys.path` probably 
> can't be updated since that has to exist before you can import 
> `pathlib` and that sort of thing ripples out, but I'm hoping I can 
> update at least some things and this is a very unique case of 
> potentially needing to stick with `str`). Hopefully if people start 
> asking for `pathlib` support from projects they will add it, 
> eventually leading to an alleviation of the desire to have 
> `pathlib.Path` inherit from `str`.

I think updating the stdlib looks sufficiently simple to start hacking 
CPython. So, I would be willing to help here once the github repo is up 
and running. :)


Nice post, Brett. Seems you covered the most important point quite well 
and also gave an explanation of why people request path->str.


Best,
Sven


More information about the Python-ideas mailing list