Test for an empty directory that could be very large if it is not empty?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Aug 6 23:25:38 EDT 2014


Ben Finney wrote:

> Virgil Stokes <vs at it.uu.se> writes:
> 
>> Suppose I have a directory C:/Test that is either empty or contains
>> more than 2000000 files, all with the same extension (e.g. *.txt). How
>> can I determine if the directory is empty WITHOUT the generation of a
>> list of the file names in it (e.g. using os.listdir('C:/Test')) when
>> it is not empty?
> 
> What is your goal for that? Have you measured the performance difference
> and decided *based on objective observation* that it's too expensive?

Normally I would agree with you, but this is one case where there is no need
to measure, we can tell in advance that at least sometimes there will be a
severe performance hit simply by considering the nature of file systems. In
particular, consider the case where the directory is a remote file system
on the other side of the world over a link with many dropped packets or
other noise. Waiting for 200 thousand file names to be transmitted, only to
throw them away, is surely going to be slower than (say) the results of a
call to os.stat. (Assuming that gives the answer.)

The difficult question then becomes: is it reasonable to (potentially) slow
down the common case of local file systems by a tiny amount, in order to
protect against the (rare) case where it will give a big speed things up?



-- 
Steven




More information about the Python-list mailing list