[issue23605] Use the new os.scandir() function in os.walk()

Ben Hoyt report at bugs.python.org
Thu Mar 12 03:53:49 CET 2015


Ben Hoyt added the comment:

To Victor and Serhiy:

1) Serhiy's point about not needing to build the symlinks set when followlinks is True is a good one, because it'll never get used. Just a "if not followlinks: ..." around that try/except would do it. Though this is a small optimization, as I expect 95% of people use os.walk() with followlinks=False, which is the default.

2) Much as I don't want to admit it :-), I think Serhiy has a point regarding the change in behaviour. Can we accept this tiny change? This happens because the previous implementation of os.walk() calls path.islink(), which does a real syscall, after the yield returns.

So if the caller modifies "dirnames" and adds a symlink to a directory it won't be in the symlinks set. Or if (as Serhiy's example shows) they change a symlink-to-a-directory to a directory the new implementation doesn't do another syscall to check, so differs from the old one -- in Serhiy's example, the old os.walk() will recurse into the changed symlink-to-a-directory-that's-now-a-directory, whereas the new os.walk() won't recurse.

Arguably the new behaviour is just as good here, but it is different. And Serhiy's function unsymlink() is a reasonable example of how this might play out.

The os.walk() docs explicitly allow modifying "dirnames" -- not just removing and reordering, but also adding entries, which could surface the difference in behaviour: "the caller can modify the dirnames list in-place ... even to inform walk() about directories the caller creates or renames before it resumes walk() again". See here:

https://docs.python.org/3.4/library/os.html#os.walk

So I do think it's a real issue. Will this really be an issue in practice? I don't know; I'm tempted to think not.

Obviously if we have to call stat/islink again, it defeats half of the purpose of scandir. But I don't see a way around it if we want to keep the exact old os.walk() behaviour. We could fix the behaviour if a directory/symlink was added to "dirnames" fairly easily by testing whether it changed, but I don't see a way to fix the unsymlinks() example without a syscall.

Thoughts?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23605>
_______________________________________


More information about the Python-bugs-list mailing list