[issue22524] PEP 471 implementation: os.scandir() directory scanning function

STINNER Victor report at bugs.python.org
Thu Oct 9 13:24:45 CEST 2014


STINNER Victor added the comment:

On Windows, I guess that "benchmark.py --size" is faster with scandir() than with os.walk(), because os.stat() is never called.

benchmark.py has a bug in do_os_walk() when the --size option is used: attached do_os_walk_getsize.patch is needed.

Sizes returned by os.walk() and scandir.walk() are different. I guess that the behaviour of symbolic links to directory is different. Because of that, I'm not sure that benchmark timings are reliable, but well, it should give us an idea of performances.

To compute the size of a tree, scandir() is twice faster (2.1x as fast) than os.walk(): os.walk=1.435 sec, scandir.walk=0.675 sec.

"os" is 41% faster than "c": c=1150 ms, os=675 ms.


Results of "benchmark.py --size" on my Linux Fedora 20:

haypo at smithers$ ~/prog/python/default/python setup.py build && for scandir in generic python c os; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.5/ ~/prog/python/default/python benchmark.py -s /usr/share -c $scandir || break; done
running build
running build_py
running build_ext

=== generic ===
Using very slow generic version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.425s, scandir.walk took 1.147s -- 1.2x as fast

=== python ===
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.421s, scandir.walk took 1.651s -- 0.9x as fast

=== c ===
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.426s, scandir.walk took 1.150s -- 1.2x as fast

=== os ===
Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.435s, scandir.walk took 0.675s -- 2.1x as fast

----------
Added file: http://bugs.python.org/file36848/do_os_walk_getsize.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22524>
_______________________________________


More information about the Python-bugs-list mailing list