[Python-Dev] Issue #11051: system calls per import

Rian Hunter rian at dropbox.com
Fri Feb 4 06:10:33 CET 2011


Hello

Speaking from experience from my observations on millions of machines the stat() call is *very slow* when compared to readdir(), FindNextFile(), getdirentriesattr(), etc. When we switched from a file system indexer that stat()ed every file to one that read directories we noticed an average speedup of about 10x.

You can probably attribute this to the fact that in file system indexing the raw system call volume is much lower (not having to stat() each file, just read the directories) but also due to the fact that there is much less HD seeking (stat() has to jump around the HD, usually all directory entries fit in one block). If you only need to test for the existence of multiple files and don't need the extra information that stat() gives you, it might make sense to avoid the context switch/IO overhead.  

Rian

On Jan 31, 2011, at 4:43 AM, Antoine Pitrou wrote:

> On Mon, 31 Jan 2011 00:08:25 -0800
> Guido van Rossum <guido at python.org> wrote:
>> 
>> (Basically I am biased to believe that stat() is a pretty slow system
>> call -- this may just be old NFS lore though.)
> 
> I don't know about NFS, but starting a Python interpreter located on a
> Samba share from a Windows VM is quite slow too.
> I think Martin is right for the common case: on a local filesystem on a
> modern Unix, stat() is certainly very fast. Remote or
> distributed filesystems seem to be more of a problem.
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/rian%40dropbox.com



More information about the Python-Dev mailing list