Determining when a file has finished copying

Larry Bates larry.bates at websafe.com`
Sun Jul 13 01:16:13 EDT 2008


Sean DiZazzo wrote:
> On Jul 9, 5:34 pm, keith <ke... at keithperkins.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> Ethan Furman wrote:
>>> writeson wrote:
>>>> Guys,
>>>> Thanks for your replies, they are helpful. I should have included in
>>>> my initial question that I don't have as much control over the program
>>>> that writes (pgm-W) as I'd like. Otherwise, the write to a different
>>>> filename and then rename solution would work great. There's no way to
>>>> tell from the os.stat() methods to tell when the file is finished
>>>> being copied? I ran some test programs, one of which continously
>>>> copies big files from one directory to another, and another that
>>>> continously does a glob.glob("*.pdf") on those files and looks at the
>>>> st_atime and st_mtime parts of the return value of os.stat(filename).
>>>>> From that experiment it looks like st_atime and st_mtime equal each
>>>> other until the file has finished being copied. Nothing in the
>>>> documentation about st_atime or st_mtime leads me to think this is
>>>> true, it's just my observations about the two test programs I've
>>>> described.
>>>> Any thoughts? Thanks!
>>>> Doug
>>> The solution my team has used is to monitor the file size.  If the file
>>> has stopped growing for x amount of time (we use 45 seconds) the file is
>>> done copying.  Not elegant, but it works.
>>> --
>>> Ethan
>> Also I think that matching the md5sums may work.  Just set up so that it
>> checks the copy's md5sum every couple of seconds (or whatever time
>> interval you want) and matches against the original's.  When they match
>> copying's done. I haven't actually tried this but think it may work.
>> Any more experienced programmers out there let me know if this is
>> unworkable please.
>> K
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.6 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>>
>> iD8DBQFIdVkX8vmNfzrLpqoRAsJ2AKCp8wMz93Vz8y9K+MDSP33kH/WHngCgl/wM
>> qTFBfyIEGhu/dNSQzeRrwYQ=
>> =Xvjq
>> -----END PGP SIGNATURE-----
> 
> I use a combination of both the os.stat() on filesize, and md5.
> Checking md5s works, but it can take a long time on big files.  To fix
> that, I wrote a simple  sparse md5 sum generator.  It takes a small
> number bytes from various areas of the file, and creates an md5 by
> combining all the sections. This is, in fact, the only solution I have
> come up with for watching a folder for windows copys.
> 
> The filesize solution doesn't work when a user copies into the watch
> folder using drag and drop on Windows because it allocates all the
> attributes of the file before any data is written.  The filesize will
> always show the full size of the file.
> 
> ~Sean

While a lot depends on HOW the copying program does its copy, I've recently been 
able to get pyinotify to watch folders.  By watching for IN_CLOSE_WRITE events I 
can see when files are closed by the writer and then process them instantly 
after they have been written.  Now if the writer does something like:

open
write
close
open append
write
close
.
.
.

This won't work as well.

FYI,
Larry



More information about the Python-list mailing list