Iterating over files of a huge directory

Evan Driscoll driscoll at cs.wisc.edu
Mon Dec 17 15:09:34 EST 2012


On 12/17/2012 01:50 PM, Oscar Benjamin wrote:
> On 17 December 2012 18:40, Evan Driscoll <driscoll at cs.wisc.edu> wrote:
>> On 12/17/2012 09:52 AM, Oscar Benjamin wrote:
>>> https://github.com/benhoyt/betterwalk
>>
>> This is very useful to know about; thanks.
>>
>> I actually wrote something very similar on my own (I wanted to get
>> information about whether each directory entry was a file, directory,
>> symlink, etc. without separate stat() calls).
> 
> The initial goal of betterwalk seemed to be the ability to do os.walk
> with fewer stat calls. I think the information you want is part of
> what betterwalk finds "for free" from the underlying OS iteration
> (without the need to call stat()) but I'm not sure.

Yes, that's my impression as well.


>> (Also just for the record and anyone looking for other posts, I'd guess
>> said discussion was on Python-dev. I don't look at even remotely
>> everything on python-list (there's just too much), but I do skim most
>> subject lines and I haven't noticed any discussion on it before now.)
> 
> Actually, it was python-ideas:
> http://thread.gmane.org/gmane.comp.python.ideas/17932
> http://thread.gmane.org/gmane.comp.python.ideas/17757

Thanks again for the pointers; I'll have to go through that thread. It's
possible I can contribute something; it sounds like at least at one
point the implementation was ctypes-based and is sometimes slower, and I
have both a (now-defunct) C implementation and my current Cython module.
Ironically I haven't actually benchmarked mine. :-)

Evan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20121217/6ed21b6b/attachment.sig>


More information about the Python-list mailing list