[path-PEP] Path inherits from basestring again

Andrew Dalke dalke at dalkescientific.com
Sun Jul 24 17:31:48 EDT 2005


Reinhold Birkenfeld wrote:
> Okay. While a path has its clear use cases and those don't need above methods,
> it may be that some brain-dead functions needs them.

"brain-dead"?

Consider this code, which I think is not atypical.

import sys

def _read_file(filename):
  if filename == "-":
    # Can use '-' to mean stdin
    return sys.stdin
  else:
    return open(filename, "rU")


def file_sum(filename):
  total = 0
  for line in _read_file(filename):
    total += int(line)
  return total

(Actually, I would probably write it

def _read_file(file):
  if isinstance(file, basestring):
    if filename == "-":
      # Can use '-' to mean stdin
      return sys.stdin
    else:
      return open(filename, "rU")
  return file

)

Because the current sandbox Path doesn't support
the is-equal test with strings, the above function
won't work with a filename = path.Path("-").  It
will instead raise an exception saying
  IOError: [Errno 2] No such file or directory: '-'

(Yes, the code as-is can't handle a file named '-'.
The usual workaround (and there are many programs
which support '-' as an alias for stdin) is to use "./-"

% cat > './-'
This is a file
% cat ./-
This is a file
% cat -
I'm typing directly into stdin.
^D
I'm typing directly into stdin.
% 
)


If I start using the path.Path then in order to use
this function my upstream code must be careful on
input to distinguish between filenames which are
really filenames and which are special-cased pseudo
filenames.

Often the code using the API doesn't even know which
names are special.  Even if it is documented,
the library developer may decide in the future to
extend the list of pseudo filenames to include, say,
environment variable style expansion, as
  $HOME/.config

Perhaps the library developer should have come up
with a new naming system to include both types of
file naming schemes, but that's rather overkill.

As a programmer calling the API should I convert
all my path.Path objects to strings before using it?
Or to Unicode?  How do I know which filenames will
be treated specially through time?

Is there a method to turn a path.Path into the actual
string?  str() and unicode() don't work because I
want the result to be unicode if the OS&Python build
support it, otherwise string.

Is that library example I mentioned "brain-dead"?
I don't think so.  Instead I think you are pushing
too much for purity and making changes that will
cause problems - and hard to fix problems - with
existing libraries.



Here's an example of code from an existing library
which will break in several ways if it's passed a
path object instead of a string.  It comes from
spambayes/mboxutils.py

#################

This is mostly a wrapper around the various useful classes in the
standard mailbox module, to do some intelligent guessing of the
mailbox type given a mailbox argument.

+foo      -- MH mailbox +foo
+foo,bar  -- MH mailboxes +foo and +bar concatenated
+ALL      -- a shortcut for *all* MH mailboxes
/foo/bar  -- (existing file) a Unix-style mailbox
/foo/bar/ -- (existing directory) a directory full of .txt and .lorien
             files
/foo/bar/ -- (existing directory with a cur/ subdirectory)
             Maildir mailbox
/foo/Mail/bar/ -- (existing directory with /Mail/ in its path)
             alternative way of spelling an MH mailbox

  ....

def getmbox(name):
    """Return an mbox iterator given a file/directory/folder name."""

    if name == "-":
        return [get_message(sys.stdin)]

    if name.startswith("+"):
        # MH folder name: +folder, +f1,f2,f2, or +ALL
        name = name[1:]
        import mhlib
        mh = mhlib.MH()
        if name == "ALL":
            names = mh.listfolders()
        elif ',' in name:
            names = name.split(',')
        else:
            names = [name]
        mboxes = []
        mhpath = mh.getpath()
        for name in names:
            filename = os.path.join(mhpath, name)
            mbox = mailbox.MHMailbox(filename, get_message)
            mboxes.append(mbox)
        if len(mboxes) == 1:
            return iter(mboxes[0])
        else:
            return _cat(mboxes)

    if os.path.isdir(name):
        # XXX Bogus: use a Maildir if /cur is a subdirectory, else a MHMailbox
        # if the pathname contains /Mail/, else a DirOfTxtFileMailbox.
        if os.path.exists(os.path.join(name, 'cur')):
            mbox = mailbox.Maildir(name, get_message)
        elif name.find("/Mail/") >= 0:
            mbox = mailbox.MHMailbox(name, get_message)
        else:
            mbox = DirOfTxtFileMailbox(name, get_message)
    else:
        fp = open(name, "rb")
        mbox = mailbox.PortableUnixMailbox(fp, get_message)
    return iter(mbox)



It breaks with the current sandbox path because:
  - a path can't be compared to "-"
  - range isn't supported, as "name = name[1:]"

note that this example uses __contains__ ("," in name)


Is this function brain-dead?  Is it reasonable that people might
want to pass a path.Path() directly to it?  If not, what's
the way to convert the path.Path() into the correct string
object?

				Andrew
				dalke at dalkescientific.com




More information about the Python-list mailing list