[Tutor] Monitoring directories
Cameron Simpson
cs at cskk.id.au
Fri Feb 14 16:36:25 EST 2020
On 14Feb2020 12:02, Nathan D'Elboux <nathan.delboux at gmail.com> wrote:
>I have a little script below that copies files from a directory and
>all of its sub directories over to a target dir. The target dir doesnt
>have any sub dir's and is exactly how i want it. The copy checks if
>the file in the source dir exists in the target dir and then copies if
>it doesnt.
>
>Problem is is that i have another utility monitoring that target dir
>moving the files out to process them further so as it processes them
>the target dir will always be empty thus coping every file from source
>folder over and over depending on interval i run the script
>
>What i would like to do is change my below script from checking if
>sourcedir filename exists in target dir, is to change it so it
>monitors the source dir and copies the newest files over to the target
>dir.
Might I suggest that you rename the files in the source directly when
they have been copied? This light be as simple as moving them
(shutil.move) from the source directory to a parallel "processed"
directory.
If the source directory is your serious read only main repository of
source files and you do not want to modify it, perhapsyou should
maintain a parallel "to process" source tree beside it; have the thing
which puts things into the main source directory _also_ hard link them
into the "to process" (or "spool") tree, which you are free to remove
things from after the copy.
Either of these removes your need to keep extra state or play guesswork
with file timestamps.
>As this script is invoked currently via a cron job im not familiar
>with what i would need to do to make this python script monitor in
>daemon mode or how it would know the newest files and the fact it
>haunt already copied them over.
Because cron runs repeatedly you tend not to run things in "daemon" mode
from them - daemons start once and continue indefinitely to provide
whatever service they perform.
>The script i have already is below. Just looking for feedback on how i
>should improve this. Would using something like Pyinotify work best
>in this situation?
Pyinotify would suit a daemon mode script (started not from cron but
maybe the "at" command) as it relies on reporting changes. It is
perfectly valid and reasonable though, you just wouldn't start it from
cron (unless the crontab were to start it if it was no longer running,
which would require an "is it running?" check).
>import shutil
>import os
>
>targetdir = "/home/target_dir"
>sourcedir = "/home/source/"
>
>dirlist = os.listdir(sourcedir)
>for x in dirlist :
>
>directory = sourcedir + x + '/'
Consider using os.path.join for the "+ x" bit.
>filelist = os.listdir(directory)
> for file in filelist :
> if not os.path.exists(os.path.join(targetdir,file)):
>shutil.copy(directory+file,targetdir)
The other component missing here is that scripts like this are prone to
copying a source file before it is complete i.e. as soon as its name
shows in the source tree, not _after_ all the data have been copied into
it. You will have a similar problem in the target directory with
whatever is processing things there.
Tools like rsync perform this process by copying new files to a
temporary name in the target tree beginning with a '.', eg
.tempfile-blah, and only renaming the file to the real name when the
copy of data is complete. You should consider adapting your script to
use a similar mode, both for the copy to the target ("copy to a
tempfile, then rename") and also for the scan of the source (ignore
filenames which start with a '.', thus supporting such a scheme for the
delivery to the source directory).
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list