[issue22524] PEP 471 implementation: os.scandir() directory scanning function

STINNER Victor report at bugs.python.org
Fri Feb 13 10:08:49 CET 2015


STINNER Victor added the comment:

Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is more realistic, but I'm interested by micro-benchmark results.

scandir-2.patch is faster than scandir-6.patch, much fast on Windows.

Result of bench (cached): scandir-6.patch => scandir-2.patch

* Windows 7 VM using NTFS: 14.0x faster => 44.6x faster
* laptop using NFS share: 1.3x faster => 5.2x faster   *** warning: unstable results ***
* desktop PC using /tmp: 1.3x faster => 3.8x faster
* laptop using SSD and ext4: 1.3x faster => 2.8x faster
* desktop PC using HDD and ext4: 1.4x faster => 1.4x faster


Benchmark using scandir-2.patch
-------------------------------


Benchmark results with the full C implementation, scandir-2.patch.

[ C implementation ] Results of bench_scandir2.py on my desktop PC using HDD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.5x faster than listdir (scandir: 63.6 ms, listdir: 219.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 52.8 ms, listdir: 42.4 ms)
- bench_nocache: 1.4x faster than listdir (scandir: 3745.2 ms, listdir: 5217.6 ms)
- bench_nostat_nocache: 1.4x faster than listdir (scandir: 3834.1 ms, listdir: 5380.7 ms)

[ C implementation ] Results of bench_scandir2.py on my desktop PC using /tmp (tmpfs):

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.8x faster than listdir (scandir: 46.7 ms, listdir: 176.4 ms)
- bench_nostat: 0.7x faster than listdir (scandir: 38.6 ms, listdir: 28.6v)

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 44.6x faster than listdir (scandir: 125.0 ms, listdir: 5574.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 92.4 ms, listdir: 74.7 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using SSD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.6x faster (scandir: 59.4 ms, listdir: 213.3 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 50.0 ms, listdir: 38.6)
- bench_nocache: 2.8x faster than listdir (scandir: 377.5 ms, listdir: 1073.1)
- bench_nostat_nocache: 2.8x faster than listdir (scandir: 370.9 ms, listdir: 1055.0)

[ C implementation ] Results of bench_scandir2.py on my laptop using tmpfs:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 4.0x faster than listdir (scandir: 43.7 ms, listdir: 174.1)
- bench_nostat: 0.7x faster than listdir (scandir: 35.2 ms, listdir: 24.5)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 5.2x faster than listdir (scandir: 4.2 ms, listdir: 21.7 ms)
- bench_nostat: 0.6x faster than listdir (scandir: 3.3 ms, listdir: 1.9 ms)


*** Again, results with NFS are not reliable. Sometimes listing a directory conten takes 40 seconds. It's maybe a network issue. ***

It looks like d_type can be DT_UNKNOWN on NFS.


Benchmark using scandir-6.patch
-------------------------------

I rerun benchmark with scandir-6.patch with more files to compare the two benchmarks.

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 14.0x faster than listdir (scandir: 399.0 ms, listdir: 5578.7 ms)
- bench_nostat: 0.3x faster than listdir (scandir: 279.2 ms, listdir: 76.1 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 1.5x faster than listdir (scandir: 14.8 ms, listdir: 21.4 ms)
- bench_nostat: 0.2x faster than listdir (scandir: 10.6 ms, listdir: 2.2 ms)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22524>
_______________________________________


More information about the Python-bugs-list mailing list