[Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)

Ben Hoyt benhoyt at gmail.com
Fri Feb 13 14:35:00 CET 2015


> > * C implementation: scandir is at least 3.5x faster than listdir, up
> > to 44.6x faster on Windows
> > * C+Python implementation: scandir is not really faster than listdir,
> > between 1.3x and 1.4x faster
>
> So amusingly, the bottleneck is not so much the cost of system calls,
> but the cost of Python wrappers around system calls.

Yes, that's basically right. Or put another way, the cost of the extra
system calls is dwarfed by the cost of wrapping things in Python.

Victor's given a great summary of the issues at the top of this
thread, and I'm definitely for the all-C version -- otherwise we gain
a bunch of speed by not calling stat(), but then lose most of it again
with the Python wrapping. As Victor noted, the rationale for PEP 471
has always been about performance, and if we don't have much of that
(especially on Linux), it's not nearly as worthwhile.

Re maintenance of the C code: yes, the pure C version is about twice
as many lines as the half Python version (~800 vs ~400), but I think
Nick makes a good point here: "This isn't code I'd expect us to have
to change very often, so the maintenance risks associated with the
pure C implementation seem low." We have to vet this code thoroughly
basically once, now. :-)

If we go ahead with the all C approach, I'd be in favour of
refactoring a little and putting the new scandir code into a separate
C file. There are two ways to do this: a) sticking with a single
Python module and just referencing the non-static functions in
scandir.c from posixmodule.c, or b) sharing some functions but making
_scandir.c its own importable module. Option (a) is somewhat simpler
as there's not module setup stuff twice, but I don't know if there's a
precedent for that way of doing things.

-Ben


More information about the Python-Dev mailing list