[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

Ben Hoyt benhoyt at gmail.com
Tue May 14 00:41:01 CEST 2013


> I'd to see the numbers for NFS or CIFS - stat() can be brutally slow
> over a network connection (that's why we added a caching mechanism to
> importlib).

How do I know what file system Windows networking is using? In any
case, here's some numbers on Windows -- it's looking pretty good! This
is with default DEPTH/NUM_DIRS/NUM_FILES on a LAN:

Benchmarking walks on \\anothermachine\docs\Ben\bigtree, repeat 3/3...
os.walk took 11.345s, scandir.walk took 0.340s -- 33.3x as fast

And this is on a VPN on a remote network with the benchmark.py values
cranked down to DEPTH = 3, NUM_DIRS = 3, NUM_FILES = 20 (because
otherwise it was taking far too long):

Benchmarking walks on \\ben1.titanmt.local\c$\dev\scandir\benchtree,
repeat 3/3...
os.walk took 122.310s, scandir.walk took 5.452s -- 22.4x as fast

If anyone can run benchmark.py on Linux / NFS or similar, that'd be
great. You'll probably have to lower DEPTH/NUM_DIRS/NUM_FILES first
and then move the "benchtree" to the network file system to run it
against that.

> I initially quite liked the idea of not offering any methods on
> DirEntry, only properties, to make it obvious that they don't touch
> the file system, but just report info from the scandir call. However,
> I think that it ends up reading strangely, and would be confusing
> relative to the os.path() APIs.
>
> What you have now seems like a good, simple alternative.

Thanks. Yeah, I kinda liked the "DirEntry doesn't make any OS calls"
at first too, but then as I got into it I realized it make for a
really nasty API for most use cases. I like how it's ended up.

-Ben


More information about the Python-Dev mailing list