[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Tim Delaney timothy.c.delaney at gmail.com
Tue Jul 1 00:07:23 CEST 2014


On 1 July 2014 03:05, Ben Hoyt <benhoyt at gmail.com> wrote:

> > So, here's my alternative proposal: add an "ensure_lstat" flag to
> > scandir() itself, and don't have *any* methods on DirEntry, only
> > attributes.
> ...
> > Most importantly, *regardless of platform*, the cached stat result (if
> > not None) would reflect the state of the entry at the time the
> > directory was scanned, rather than at some arbitrary later point in
> > time when lstat() was first called on the DirEntry object.
>

I'm torn between whether I'd prefer the stat fields to be populated on
Windows if ensure_lstat=False or not. There are good arguments each way,
but overall I'm inclining towards having it consistent with POSIX - don't
populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.


> Yeah, I quite like this. It does make the caching more explicit and
> consistent. It's slightly annoying that it's less like pathlib.Path
> now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't
> matter. The differences in naming may highlight the difference in
> caching, so maybe it's a good thing.
>

See my comments below on .fullname.


> Two further questions from me:
>
> 1) How does error handling work? Now os.stat() will/may be called
> during iteration, so in __next__. But it hard to catch errors because
> you don't call __next__ explicitly. Is this a problem? How do other
> iterators that make system calls or raise errors handle this?
>

I think it just needs to be documented that iterating may throw the same
exceptions as os.lstat(). It's a little trickier if you don't want the
scope of your exception to be too broad, but you can always wrap the
iteration in a generator to catch and handle the exceptions you care about,
and allow the rest to propagate.

def scandir_accessible(path='.'):
    gen = os.scandir(path)

    while True:
        try:
            yield next(gen)
        except PermissionError:
            pass

2) There's still the open question in the PEP of whether to include a
> way to access the full path. This is cheap to build, it has to be
> built anyway on POSIX systems, and it's quite useful for further
> operations on the file. I think the best way to handle this is a
> .fullname or .full_name attribute as suggested elsewhere. Thoughts?
>

+1 for .fullname. The earlier suggestion to have __str__ return the name is
killed I think by the fact that .fullname could be bytes.

It would be nice if pathlib.Path objects were enhanced to take a DirEntry
and use the .fullname automatically, but you could always call
Path(direntry.fullname).

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140701/401d6b51/attachment.html>


More information about the Python-Dev mailing list