Iterating over files of a huge directory

Chris Angelico rosuav at gmail.com
Mon Dec 17 16:10:26 EST 2012


On Tue, Dec 18, 2012 at 5:29 AM, MRAB <python at mrabarnett.plus.com> wrote:
> <Off topic>
> Years ago I had to deal with an in-house application that was written
> using a certain database package. The package stored each predefined
> query in a separate file in the same directory.
>
> I found that if I packed all the predefined queries into a single file
> and then called an external utility to extract the desired query from
> the file every time it was needed into a file for the package to use,
> not only did it save a significant amount of disk space (hard disks
> were a lot smaller then), I also got a significant speed-up!
>
> It wasn't as bad as 100000 in one directory, but it was certainly too
> many...
> </Off topic>

Smart Cache, a web cache that we used to use on our network a while
ago, could potentially make a ridiculous number of subdirectories (one
for each domain you go to). Its solution: Hash the domain, then put it
into partitioning directories - by default, 4x4 of them, meaning that
there were four directories /0/ /1/ /2/ /3/ and the same inside each
of them, so the "real content" was divided sixteen ways. I don't know
if PC file systems are better at it now than they were back in the
mid-90s, but definitely back then, storing too much in one directory
would give a pretty serious performance penalty.

ChrisA



More information about the Python-list mailing list