PEP on path module for standard library

Bengt Richter bokr at oz.net
Sat Jul 23 18:14:41 EDT 2005


On Sat, 23 Jul 2005 07:05:05 +1000, John Machin <sjmachin at lexicon.net> wrote:

>Daniel Dittmar wrote:
>> Duncan Booth wrote:
>> 
>>>  I would have expected a path object to be a sequence of path elements 
>>> rather than a sequence of characters. 
>> 
>> 
>> Maybe it's nitpicking, but I don't think that a path object should be a 
>> 'sequence of path elements' in an iterator context.
>> 
>> This means that
>> 
>> for element in pathobject:
>> 
>> has no intuitive meaning for me, so it shouldn't be allowed.
>
>Try this:
>
>A file-system is a maze of twisty little passages, all alike. Junction 
>== directory. Cul-de-sac == file. Fortunately it is signposted. You are 
>dropped off at one of the entrance points ("current directory", say). 
>You are given a route (a "path") to your destination. The route consists 
>of a list of intermediate destinations.
>
>for element in pathobject:
>    follow_sign_post_to(element)
>
>Exception-handling strategy: Don't forget to pack a big ball of string. 
>Anecdotal evidence is that breadcrumbs are unreliable.
>
<indulging what="my penchant for seeking the general behind the specific ;-)" >

ISTM a path is essentially a representation of a script whose interpretation
by an orderly choice of interpreters finally leads to accessing to some entity,
typically a serial data representation, through an object, perhaps a local proxy,
that has standard methods for accessing the utimate object's desired info.

IOW, a path sequence is like a script text that has been .splitline()'d and
and the whole sequence fed to a local interpreter, which might chew through multiple
lines on its own, or might invoke interpreters on another network to deal with the
rest of the script, or might use local interpreters for various different kinds of
access (e.g., after seeing 'c:' vs 'http://' vs '/c' vs '//c' etc. on the platform
defining the interpretation of the head element).

Turning a single path string into a complete sequence of elements is not generally possible
unless you have local knowledge of the syntax of the entire tail beyond the the prefix
you have to deal with. Therefore, a local platform-dependent Pathobject class should, I think,
only recognize prefixes that it knows how to process or delegate processing for, leaving
the interpretation of the tail to the next Pathobject instance, however selected and/or
located.

So say (this is just a sketch, mind ;-)

    po = Pathobject(<string representation of whole path>)

results in a po that splits out (perhaps by regex) a prefix, a first separator/delimiter,
and the remaining tail. E.g., in class Pathobject,
    def __init__(self, pathstring=None)
        if pathstring is None: #do useful default??
        self.pathstring = pathstring
        self.prefix, self.sep, self.tail = self.splitter(pathstring)
        if self.prefix in self.registered_prefixes:
            self.child = self.registered_prefixes[self.prefix](self.tail)
        else:
            self.child = []
        self.opened_obj = None

Then the loop inside a local pathobject's open method po.open()
might go something like

    def open(self, *mode, **kw):
        if self.child:
            self.opened_obj = self.child.open(self.tail, *mode, **kw)
        else:
            self.opened_obj = file(self.pathstring, *mode)
        return self

And closing would just go to the immediately apparent opened object, and
if that had complex closing to do, it would be its responsibility to deal
with itself and its child-derived objects.

    def close(self):
        self.opened_object.close()
        

The point is that a given pathobject could produce a new or modified pathobject child
which might be parsing urls instead of windows file system path strings or could
yield an access object producing something entirely synthetic.

A synthetic capability could easily be introduced if the local element pathobject
instance looked for e.g., 'synthetic://' as a possible first element (prefix) string representation,
and then passed the tail to a subclass defining synthetic:// path interpretation.
E.g., 'synthetic://temp_free_diskspace' could be a platform-independent way to get such info as that.

Opening 'testdata:// ...' might be an interesting way to feed test suites, if pathobject subclasses
could be registered locally and found via the head element's string representation.'

One point from this is that a path string represents an ordered sequence of elements, but is heterogenous,
and therefore has potentially heterogenous syntax reflected in string tails with syntax that should be
interpreted differently from the prefix syntax.

Each successive element of a path string effectively requires an interpreter for that stage of access
pursuit, and the chain of processing may result in different path entities/objects/representations
on different systems, with different interpretations going on, sharing only that they are part of the
process of getting access to something and providing access services, if it's not a one-shot access.

This would also be a potential way to create access to a foreign file system in pure python if desired,
so long as there was a way of accessing the raw data to build on, e.g. a raw stuffit floppy, or a raw
hard disk if there's the required privileges. Also 'zip://' or 'bzip2://' could be defined
and registered by a particular script or in an automatic startup script. 'encrypted://' might be interesting.
Or if polluting the top namespace was a problem, a general serialized data access header element
might work, e.g., 'py_sda://encrypted/...' 

This is very H[ot]OTTOMH (though it reflects some thoughts I've had before, so be kind ;-)

For compatibility with the current way of doing things, you might want to do an automatic open
in the Pathobject constructor, but I don't really like that. It's easy enough to tack on ".open()"

    po = Pathobject('py_sda://encrypted/...')
    po.open() # plain read_only text default file open, apparently, but encrypted does binary behind the scenes
    print po.read()
    po.close()


Say, how about

    if Pathobject('gui://message_box/yn/continue processing?').open().read().lower()!='y':
        raise SystemExit, "Ok, really not continuing ;-)"

An appropriate registered subclass for the given platform, returned when the
Pathobject base class instantiates and looks at the first element on open() and delegates
would make that possible, and spelled platform-independently as in the code above.

</indulging>

Regards,
Bengt Richter



More information about the Python-list mailing list