pre-PEP: Object-oriented file module

Thu Aug 25 17:19:03 EDT 2005

I'd like to propose a new PEP [no, that isn't a redundant 'process'  
in there :-)--pre-PEP is a different process than PEP], for a  
standard library module that deals with files and file paths in an  
object oriented manner. I believe this module should be included as  
part of the standard Python distribution.

Background
==========
Some time ago, I wrote such a module for myself, and have found it  
extremely useful. Recently, I found a reference to a similar module,  
http://www.jorendorff.com/articles/python/path/ by Jeff Orendorff.  
There are of course differences--I think mine is more comprehensive  
but probably less stable--but the similarities in thought are  
striking. Both work by creating a class representing file paths, and  
then using that class to unify methods from shutil, os.path, and some  
builtin functions such as 'open' (and maybe some other stuff I can't  
remember).

I haven't looked at Jeff's code yet, but for my own, a major enabler  
of the enhanced functionality has been the inclusion of generators in  
Python. This allows, for example, a method which yields all of the  
lines in a file and automatically closes that file after. The  
availability of attributes also makes certain things cleaner than was  
the case in previous versions of python.

Fit With Python Philosophy
=========================
One of the strengths of Python is that it is a highly object-oriented  
language, but this is not true when it comes to handling files. As  
far as python is concerned a file path is just a string, and there  
are a bunch of things you can do with it, but they all have to be  
done with function calls (not methods) since there is no concept of a  
file path object. Even worse, these functions are spread out across  
various modules, and often have cryptic names that hardly make it  
obvious what they do.

Given that two different people concluded that such a module was  
desirable, and independently implemented modules that are actually  
very similar, I suspect there is an 'object-oriented mindset' to  
which this way of addressing files and file paths is natural. And  
that should be part of Python.

Pragmatic Justification
=================
I've been using my module for about a year and a half now. The ease- 
of-use and uniformity make a huge (I'm tempted to say 'vast')  
difference in dealing with files. I believe other users would  
experience an increase in efficiency when dealing with files ranging  
from 'significant' to 'very large' (in precise technical terms :-) )   
Also, I think this type of API would be much easier for new users to  
learn and use.

Examples
========
A few examples are in order. Again, these are from my own library,  
since I'm not too familiar with Jeff's. Also, this is stuff I'm just  
typing in right now as an illustration--there may be syntactic  
errors. (However, all of this functionality is present.) And these by  
no means represent the full functionality that is already defined.

# define a new path object
mydir = filepath("#&*$directory")

# Note that special characters are automatically escaped
# by filepath, as necessary for the current OS. If a character
# is illegal in a file name no matter what (cannot be escaped),
# an exception will be raised.

# A file in that directory
f = mydir / "some.txt"

# Go through the lines in the file. When all lines are done,
# the file will be closed automatically. If the file does not
# actually exist, an appropriate exception will be raised.
for line in f.iterlines():
     ...do something...

#The directory containing f is, of course, 'mydir'
assert f.parent == mydir

#Another path
aPath = filepath(....)

#In my module (not in Jeff's), a file path is considered
# semantically as a sequence of directory names terminated
# by the name of a file or directory. This makes it easy to
# obtain the name of the file at the end of a path:
theFile = aPath[-1]

# or the directory leading to that file
parentDir = aPath[0:-1]

#of course, these two common indexes/slices are accessible through  
attributes
theFile = aPath.basename
parentDir = aPath.parent

# A more powerful 'walk'-type method is included. Below,
# the 'recursive' indicates that directories should be recursively
# walked, and the 'preorder' indicates that directories should
# be included in the iteration _before_ their contents are given.
# There is also a 'postorder' argument, and both may be used to
# yield directories both before and after their contents.
aPath.iterfiles(recursive=True, preorder=True)

# With the advent of the 'itertools' module in python, there is no
# need to provide an argument taking a function that is applied
# during the walk process, so in that sense, iterfiles is actually  
simpler than walk.

#...and more. All of the various file capabilities available in  
Python are provided
#  in a unified package in this module.