how to know if folder contents have changed

davisn90210 at gmail.com davisn90210 at gmail.com
Tue Nov 13 21:20:08 EST 2007


On Nov 12, 11:27 am, "Martin Marcher" <mar... at marcher.name> wrote:
> 2007/11/12, davisn90... at gmail.com <davisn90... at gmail.com>:
>
> > Why not use the file creation/modification timestamps?
>
> because you'd have to
>
> a) create a thread that pulls all the time for changes or

Given that it would only involve a check of one timestamp (the
directory the files are located in), I don't think polling "from time
to time" would be unreasonable.  The modification timestamp of the
directory should be sufficient given the use case.  Even if it's not,
tracking modification times for the files in the directory would not
be unreasonable.

> b) test everytime for changes
>

Checking a timestamp should be a very quick operation.  Unless
"everytime" occurs *very* frequently, it's certainly not unreasonable.

> fam informs in a notification like way.
>

FAM would work too.  However,
1) According to http://oss.sgi.com/projects/fam/faq.html#what_os_fam,
FAM "should be fairly easy to port to ... Unix-like operating
systems ....".  If the original poster is a user of a "Uniix-like
operating system" he/she may actually be able to use it.  Regardless,
it seems to me that you would lose a great deal of portability (i.e.,
is there a Windows port?), which may or may not be important to the
poster.
2) FAM undoubtedly uses some system resources.  Probably very little,
but it's still an overhead that must be taken into account.
3) You still need to use another method for maintaining state across
program invocations, do you not?

Using timestamps are:
1) Portable.  Can you name one OS that does not provide timestamps?
Last I checked, even Windows does :-)
2) Storage efficient.  I don't have to actually *store* the
timestamps.  I can just check to see if a file/directory was modified
after the last time I checked.
3) Easy to maintain persistent state -- just store the timestamp!

> Personally I'd create a "hidden" cache file parsable by configparser
> and have filename = $favorite_checksum_algo - key value pairs in it if
> it's not a long running process.
>

What is your reasoning for this?  It seems to me that it is
inefficient and unreliable.  First of all you have to compute the
checksum (which undoubtedly would involve reading every byte the file)
-- not just once, but "everytime" (or however often you perform the
check). Secondly, it is possible for the checksum to be the same even
if the file has changed.  Unlikely?  Perhaps (depends on checksum
algorithm used).  Impossible?  No.  So, in effect, you are using a
"slow" algorithm that is known to give incorrect results in certain
cases -- all to replace something as basic as timestamps?

> Otherwise I'd probably go with fam (or hal i think that's the other
> thing that does that)
>
> hth
> martin
>
> --http://noneisyours.marcher.namehttp://feeds.feedburner.com/NoneIsYours

Thanks for the critique -- feel free to punch holes.

--Nathan Davis




More information about the Python-list mailing list