[Python-Dev] PEP 355 status
Talin
talin at acm.org
Wed Oct 25 05:42:59 CEST 2006
BJörn Lindqvist wrote:
> On 10/1/06, Guido van Rossum <guido at python.org> wrote:
>> On 9/30/06, Giovanni Bajo <rasky at develer.com> wrote:
>>> It would be terrific if you gave us some clue about what is wrong in PEP355, so
>>> that the next guy does not waste his time. For instance, I find PEP355
>>> incredibly good for my own path manipulation (much cleaner and concise than the
>>> awful os.path+os+shutil+stat mix), and I have trouble understanding what is
>>> *so* wrong with it.
>>>
>>> You said "it's an amalgam of unrelated functionality", but you didn't say what
>>> exactly is "unrelated" for you.
>> Sorry, no time. But others in this thread clearly agreed with me, so
>> they can guide you.
>
> I'd like to write a post mortem for PEP 355. But one important
> question that haven't been answered is if there is a possibility for a
> path-like PEP to succeed in the future? If so, does the path-object
> implementation have to prove itself in the wild before it can be
> included in Python? From earlier posts it seems like you don't like
> the concept of path objects, which others have found very interesting.
> If that is the case, then it would be nice to hear it explicitly. :)
Let me take a crack at it - I'm always good for spouting off an arrogant
opinion :)
Part 1: "Amalgam of Unrelated Functionality"
To me, the Path module felt very much like the "swiss army knife"
anti-pattern - a whole lot of functions that had little in common other
than the fact that paths were involved.
More specifically, I think its important to separate the notion of paths
as abstract "reference" objects from filesystem manipulators. When I
call a function that operates on a path, I want to clearly distinguish
between a function that merely does a transformation on the path string,
vs. one that actually hits the disk. This goes along with the "principle
of least surprise" - it should never be the case that I cause an i/o
operation to occur when I wasn't expecting it.
For example, a function that computes the parent directory of a path
should not IMHO be a sibling of a function which tests for the existence
or readability of a file.
I tend to think of paths and filesystems as broken down into 3 distinct
domains, which are locators, inodes, and files. I realize that not all
file systems on all platforms use the term 'inode', and have somewhat
different semantics, but they all have some object which fulfills that role.
-- A locator is an abstract description of how to "get to" a
resource. A file path is a "locator" in exactly the sense that a URL is.
Locators need not refer to 'real' resources in order to be valid. A
locator to a non-existent resource still maintains a consistent
structure, and can be manipulated and transformed without ever actually
dereferencing it. A locator does not, however, have any properties or
attributes - you cannot tell, for example, the creation date of a file
by looking at its locator.
-- An inode is a descriptor that points to some actual content. It
actually lives on the filesystem, and has attributes (such as creation
data, last modified date, permissions, etc.)
-- 'Files' are raw content streams - they are the actual bytes that
make up the data within the file. Files do not have 'names' or 'dates'
directly in of themselves - only the inodes that describe them do.
Now, I don't insist that everyone in the world should classify things
the way I do - I'm just describing how I see it. Were I to come up with
my own path-related APIs, they would most likely be divided into 3
sub-modules corresponding to the 3 subdivisions listed above. I would
want to make it clear that when you are operating strictly at the
locator level, you aren't touching inodes or files; When you are
operating at the inode level, you aren't touching file content.
Part 2: Should paths be objects?
I should mention that while I appreciate the power of OOP, I am also
very much against the kind of OOP-absolutism that has been taught in
many schools of software engineering in the last two decades. There are
a lot of really good, formal, well-thought-out systems of program
organization, and OOP is only one of many.
A classic example is relational algebra which forms the basis for
relational databased - the basic notion that all operations on tabular
data can be "composed" or "chained" in exactly the way that mathematical
formula can be. In relational algebra, you can take a view of a view of
a view, or a subquery of a query of a view of a table, and so on. Even
single, scalar values - such as the count of the number of results of a
query - are of the same data type as a 'relation', and can be operated
on as such, or fed as input to a subsequent operation.
I bring up the example of relational algebra because it applies to paths
as well: There is a kind of "path algebra", where an operation on a path
results in another path, which can be operated on further.
Now, one way to achieve this kind of path algebra is to make paths an
object, and to overload the various functions and operators so that
they, too, return paths.
However, path algebra can be implemented just as easily in a functional
style as in an object style. Properly done, a functional design
shouldn't be significantly more bulky or wordy than an object design;
The fact that the existing legacy API fails this test has more to do
with history than any inherent advantages of OOP vs. functional style.
(Actually, the OOP approach has a slight advantage in terms of the
amount of syntactic sugar available, but that is [a] an artifact of the
current Python feature set, and [b] not necessarily a good thing if it
leads to gratuitous, Perl-ish cleverness.)
As a point of comparison, the Java Path API and the C# .Net Path API
have similar capabilities, however the former is object-based whereas
the latter is functional and operates on strings. Having used both of
them extensively, I find I prefer the C# style, mainly due to the ease
of intra-conversion with regular strings - being able to read strings
from configuration files, for example, and immediately operate on them
without having to convert to path form. I don't find "p.GetParent()"
much harder or easier to type than "Path.GetParent( p )"; but I do
prefer "Path.GetParent( string )" over "Path( string ).GetParent()".
However, this is only a *mild* preference - I could go either way, and
wouldn't put up much of a fight about it.
(I should not that the Java Path API does *not* follow my scheme of
separation between locators and inodes, while the C# API does, which is
another reason why I prefer the C# approach.)
Part 3: Does this mean that the current API cannot be improved?
Certainly not! I think everyone (well, almost) agrees that there is much
room for improvement in the current APIs. They certainly need to be
refactored and recategorized.
But I don't think that the solution is to take all of the path-related
functions and drop them into a single class, or even a single module.
---
Anyway, I hope that (a) that answers your questions, and (b) isn't too
divergent from most people's views about Path.
-- Talin
More information about the Python-Dev
mailing list