Having trouble extracting useful directory details from ftplib.FTP

Stephen Horne steve at ninereeds.fsnet.co.uk
Sat Feb 21 08:31:13 EST 2004


Just recently I decided I want to make use of my ISP freebie webspace.
In order to make that easier, I'd like to be able to automatically
synchronise an FTP file/folder heirarchy with one on my local hard
drive. I figured this should be easily handled in Python, and broadly
speaking it is, but I am having a little difficulty.

You see, in order to handle the synchronise correctly, I need to be
able to determine what is on the FTP server to start with. I need to
be able to recursively search the folders on the FTP server. But so
far as I can tell, I'm having to rely on some very flaky techniques
and some observations about my ISPs FTP server that could potentially
change.

I tried using FTP.nlst(pathname) but this wasn't very helpful...

1.  It created a bogus first entry in the list - a line saying
    "Found 1" (irrespective of the actual number of items found)
    IIRC.

2.  It provided no indication of whether each item was a file or
    a folder.

Therefore, I switched to using the following piece of code...

  filelist=[]
  ftp.retrlines ("LIST "+pathname, filelist.append)

I still get a bogus line at the top, but that's easily worked around.
The important thing is that by testing out the results, I found that
for the ftp server I am using I can look at the first character for
the directory flag, and use the slice [53:] to extract the filename
from each line.

The problem is, however, that this seems like skating on very thin
ice. So far as I can tell, there is no standard for the format
reported by the LIST command.

This so far is enough - I can use this to find all files and folders
on the FTP server, and I can download backup copies to a hard drive
folder and then delete them all before uploading the replacement
files, all in one automatic process that (at the moment) just works.
But - apart from the fact that I'm very nervous about it - ideally,
I'd like to do a more sane synchronise - only delete files that need
to be deleted, and only upload files that have changed. To do that, I
need to get more details about the files on the FTP server - time and
date stamps in particular. Getting hold of this information looks like
being at least as much of a hack as what I've done already.

So the question is... Is there some better way of handling this that
I'm missing, or is this just the way it is? After all, there are a lot
of FTP client utilities out there that seem to get this information
perfectly reliably, which seems surprising if they have to rely on
parsing an inconsistent directory listing format.

Any hints?


-- 
Steve Horne

steve at ninereeds dot fsnet dot co dot uk



More information about the Python-list mailing list