reading directory entries one by one

holger krekel pyth at devel.trillke.net
Wed May 22 15:40:36 EDT 2002


[reposted to the list, because you didn't have private intentions?!]

Andrew Dalke wrote:
> > right, but if your application  only *iterates* over entries then there
> > is nothing to understand from the user-side (except for ill cases :-).
> 
> No, if *all* applications only iterate over entires then there is
> nothing to understand.  If some want a list and some want iteration,
> then a list solves both problems.

yes. But if the *problem* is that you need to iterate then
an iterator is the next best thing :-)

> I know people more often want the
> list than the iteration.  And I've yet to see a real problem with
> returning a full list.

on my machine with several thousand directories and 100K+ files 
i certainly use an iterator for a recursive directory walk. 

> >What i also appreciate about generators/iterators is that 
> >they  generally decrease the *latency* for getting the first entry. 
> 
> I made 5,000 files in a directory.  Time to listdir the directory is
> 0.02 seconds.  I can't imagine needing higher latency.

I can and do. 

> Though since I had just created the directory the information is in cache.  If
> it wasn't in cache, well, things would be different all over the board.

exactly.
 
> For example, with iterators if you read a filename, do a lot of 
> work, read a filename, then you might have to reseek the directory
> information again as its out of disk cache.  While with listdir its all
> in memory.

If this memory was used for accelarating other (parts of) programs  then
it was good that we didn't preallocate memory. Think of it on the scaling level
of 1000 running processes on a heavy webserver.

premature-optimization-is-of-course-evil-ly yours, holger





More information about the Python-list mailing list