[Python-Dev] PEP 355 status

Wed Oct 25 05:42:59 CEST 2006

BJörn Lindqvist wrote:
> On 10/1/06, Guido van Rossum <guido at python.org> wrote:
>> On 9/30/06, Giovanni Bajo <rasky at develer.com> wrote:
>>> It would be terrific if you gave us some clue about what is wrong in PEP355, so
>>> that the next guy does not waste his time. For instance, I find PEP355
>>> incredibly good for my own path manipulation (much cleaner and concise than the
>>> awful os.path+os+shutil+stat mix), and I have trouble understanding what is
>>> *so* wrong with it.
>>>
>>> You said "it's an amalgam of unrelated functionality", but you didn't say what
>>> exactly is "unrelated" for you.
>> Sorry, no time. But others in this thread clearly agreed with me, so
>> they can guide you.
> 
> I'd like to write a post mortem for PEP 355. But one important
> question that haven't been answered is if there is a possibility for a
> path-like PEP to succeed in the future? If so, does the path-object
> implementation have to prove itself in the wild before it can be
> included in Python? From earlier posts it seems like you don't like
> the concept of path objects, which others have found very interesting.
> If that is the case, then it would be nice to hear it explicitly. :)

Let me take a crack at it - I'm always good for spouting off an arrogant 
opinion :)

Part 1: "Amalgam of Unrelated Functionality"

To me, the Path module felt very much like the "swiss army knife" 
anti-pattern - a whole lot of functions that had little in common other 
than the fact that paths were involved.

More specifically, I think its important to separate the notion of paths 
as abstract "reference" objects from filesystem manipulators. When I 
call a function that operates on a path, I want to clearly distinguish 
between a function that merely does a transformation on the path string, 
vs. one that actually hits the disk. This goes along with the "principle 
of least surprise" - it should never be the case that I cause an i/o 
operation to occur when I wasn't expecting it.

For example, a function that computes the parent directory of a path 
should not IMHO be a sibling of a function which tests for the existence 
or readability of a file.

I tend to think of paths and filesystems as broken down into 3 distinct 
domains, which are locators, inodes, and files. I realize that not all 
file systems on all platforms use the term 'inode', and have somewhat 
different semantics, but they all have some object which fulfills that role.

   -- A locator is an abstract description of how to "get to" a 
resource. A file path is a "locator" in exactly the sense that a URL is. 
Locators need not refer to 'real' resources in order to be valid. A 
locator to a non-existent resource still maintains a consistent 
structure, and can be manipulated and transformed without ever actually 
dereferencing it. A locator does not, however, have any properties or 
attributes - you cannot tell, for example, the creation date of a file 
by looking at its locator.

   -- An inode is a descriptor that points to some actual content. It 
actually lives on the filesystem, and has attributes (such as creation 
data, last modified date, permissions, etc.)

   -- 'Files' are raw content streams - they are the actual bytes that 
make up the data within the file. Files do not have 'names' or 'dates' 
directly in of themselves - only the inodes that describe them do.

Now, I don't insist that everyone in the world should classify things 
the way I do - I'm just describing how I see it. Were I to come up with 
my own path-related APIs, they would most likely be divided into 3 
sub-modules corresponding to the 3 subdivisions listed above. I would 
want to make it clear that when you are operating strictly at the 
locator level, you aren't touching inodes or files; When you are 
operating at the inode level, you aren't touching file content.

Part 2: Should paths be objects?

I should mention that while I appreciate the power of OOP, I am also 
very much against the kind of OOP-absolutism that has been taught in 
many schools of software engineering in the last two decades. There are 
a lot of really good, formal, well-thought-out systems of program 
organization, and OOP is only one of many.

A classic example is relational algebra which forms the basis for 
relational databased - the basic notion that all operations on tabular 
data can be "composed" or "chained" in exactly the way that mathematical 
formula can be. In relational algebra, you can take a view of a view of 
a view, or a subquery of a query of a view of a table, and so on. Even 
single, scalar values - such as the count of the number of results of a 
query - are of the same data type as a 'relation', and can be operated 
on as such, or fed as input to a subsequent operation.

I bring up the example of relational algebra because it applies to paths 
as well: There is a kind of "path algebra", where an operation on a path 
results in another path, which can be operated on further.

Now, one way to achieve this kind of path algebra is to make paths an 
object, and to overload the various functions and operators so that 
they, too, return paths.

However, path algebra can be implemented just as easily in a functional 
style as in an object style. Properly done, a functional design 
shouldn't be significantly more bulky or wordy than an object design; 
The fact that the existing legacy API fails this test has more to do 
with history than any inherent advantages of OOP vs. functional style. 
(Actually, the OOP approach has a slight advantage in terms of the 
amount of syntactic sugar available, but that is [a] an artifact of the 
current Python feature set, and [b] not necessarily a good thing if it 
leads to gratuitous, Perl-ish cleverness.)

As a point of comparison, the Java Path API and the C# .Net Path API 
have similar capabilities, however the former is object-based whereas 
the latter is functional and operates on strings. Having used both of 
them extensively, I find I prefer the C# style, mainly due to the ease 
of intra-conversion with regular strings - being able to read strings 
from configuration files, for example, and immediately operate on them 
without having to convert to path form. I don't find "p.GetParent()" 
much harder or easier to type than "Path.GetParent( p )"; but I do 
prefer "Path.GetParent( string )" over "Path( string ).GetParent()".

However, this is only a *mild* preference - I could go either way, and 
wouldn't put up much of a fight about it.

(I should not that the Java Path API does *not* follow my scheme of 
separation between locators and inodes, while the C# API does, which is 
another reason why I prefer the C# approach.)

Part 3: Does this mean that the current API cannot be improved?

Certainly not! I think everyone (well, almost) agrees that there is much 
room for improvement in the current APIs. They certainly need to be 
refactored and recategorized.

But I don't think that the solution is to take all of the path-related 
functions and drop them into a single class, or even a single module.

---

Anyway, I hope that (a) that answers your questions, and (b) isn't too 
divergent from most people's views about Path.

-- Talin