walking a directory with very many files

Mike Kazantsev mk.fraggod at gmail.com
Fri Jun 19 03:40:15 EDT 2009


On Fri, 19 Jun 2009 17:53:40 +1200
Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:

> In message <20090618081423.2e0356b9 at coercion>, Mike Kazantsev wrote:
> 
> > On Thu, 18 Jun 2009 10:33:49 +1200
> > Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:
> > 
> >> In message <20090617214535.108667ca at coercion>, Mike Kazantsev
> >> wrote:
> >> 
> >>> On Wed, 17 Jun 2009 23:04:37 +1200
> >>> Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:
> >>> 
> >>>> In message <20090617142431.2b25faf5 at malediction>, Mike Kazantsev
> >>>> wrote:
> >>>> 
> >>>>> On Wed, 17 Jun 2009 17:53:33 +1200
> >>>>> Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:
> >>>>> 
> >>>>>>> Why not use hex representation of md5/sha1-hashed id as a
> >>>>>>> path, arranging them like /path/f/9/e/95ea4926a4 ?
> >>>>>>> 
> >>>>>>> That way, you won't have to deal with many-files-in-path
> >>>>>>> problem ...
> >>>>>> 
> >>>>>> Why is that a problem?
> >>>>> 
> >>>>> So you can os.listdir them?
> >>>> 
> >>>> Why should you have a problem os.listdir'ing lots of files?
> >>> 
> >>> I shouldn't, and I don't ;)
> >> 
> >> Then why did you suggest that there was a problem being able to
> >> os.listdir them?
> > 
> > I didn't, OP did ...
> 
> Then why did you reply to my question "Why is that a problem?" with
> "So that you can os.listdir them?", if you didn't think there was a
> problem (see above)?

Why do you think that if I didn't suggest there is a problem, I think
there is no problem?

I do think there might be such a problem and even I may have to face it
someday. So, out of sheer curiosity how more rediculous this topic can
be I'll try to rephrase and extend what I wrote in the first place:


Why would you want to listdir them?
I can imagine at least one simple scenario: you had some nasty crash
and you want to check that every file has corresponding, valid db
record.

What's the problem with listdir if there's 10^x of them?
Well, imagine that db record also holds file modification time (say,
the files are some kind of cache), so not only you need to compare
listdir results with db, but also do os.stat on every file and some
filesystems will do it very slowly with so many of them in one place.


Now, I think I made this point in the first answer, no?

Of course you can make it more rediculous by your
I-can-talk-away-any-problem-I-can't-see-or-solve approach by asking "why
would you want to use such filesystems?", "why do you have to use
FreeBSD?", "why do you have to work for such employer?", "why do you
have to eat?" etc, but you know, sometimes it's easier and better for
the project/work just to solve it, than talk everyone else away from it
just because you don't like otherwise acceptable solution.

-- 
Mike Kazantsev // fraggod.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 205 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20090619/7418ae1c/attachment-0001.sig>


More information about the Python-list mailing list