[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

Ben Hoyt benhoyt at gmail.com
Mon May 13 00:04:11 CEST 2013


> And if we're creating a custom object instead, why return a 2-tuple
> rather than making the entry's name an attribute of the custom object?
>
> To me, that suggests a more reasonable API for os.scandir() might be
> for it to be an iterator over "dir_entry" objects:
>
>     name (as a string)
>     is_file()
>     is_dir()
>     is_link()
>     stat()
>     cached_stat (None or a stat object)

Nice! I really like your basic idea of returning a custom object
instead of a 2-tuple. And I agree with Christian that .stat() would be
clearer called .lstat(). I also like your later idea of simply
exposing .dirent (would be None on Windows).

One tweak I'd suggest is that is_file() etc be called isfile() etc
without the underscore, to match the naming of the os.path.is*
functions.

> That would actually make sense at an implementation
> level anyway - is_file() etc would check self.cached_lstat first, and
> if that was None they would check self.dirent, and if that was also
> None they would raise an error.

Hmm, I'm not sure about this at all. Are you suggesting that the
DirEntry object's is* functions would raise an error if both
cached_lstat and dirent were None? Wouldn't it make for a much simpler
API to just call os.lstat() and populate cached_lstat instead? As far
as I'm concerned, that'd be the point of making DirEntry.lstat() a
function.

In fact, I don't think .cached_lstat should be exposed to the user.
They just call entry.lstat(), and it returns a cached stat or calls
os.lstat() to get the real stat if required (and populates the
internal cached stat value). And the entry.is* functions would call
entry.lstat() if dirent was or d_type was DT_UNKNOWN. This would
change relatively nasty code like this:

files = []
dirs = []
for entry in os.scandir(path):
    try:
        isdir = entry.isdir()
    except NotPresentError:
        st = os.lstat(os.path.join(path, entry.name))
        isdir = stat.S_ISDIR(st)
    if isdir:
        dirs.append(entry.name)
    else:
        files.append(entry.name)

Into nice clean code like this:

files = []
dirs = []
for entry in os.scandir(path):
    if entry.isfile():
        dirs.append(entry.name)
    else:
        files.append(entry.name)

This change would make scandir() usable by ordinary mortals, rather
than just hardcore library implementors.

In other words, I'm proposing that the DirEntry objects yielded by
scandir() would have .name and .dirent attributes, and .isdir(),
.isfile(), .islink(), .lstat() methods, and look basically like this
(though presumably implemented in C):

class DirEntry:
    def __init__(self, name, dirent, lstat, path='.'):
        # User shouldn't need to call this, but called internally by scandir()
        self.name = name
        self.dirent = dirent
        self._lstat = lstat  # non-public attributes
        self._path = path

    def lstat(self):
        if self._lstat is None:
            self._lstat = os.lstat(os.path.join(self._path, self.name))
        return self._lstat

    def isdir(self):
        if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
            return self.dirent.d_type == DT_DIR
        else:
            return stat.S_ISDIR(self.lstat().st_mode)

    def isfile(self):
        if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
            return self.dirent.d_type == DT_REG
        else:
            return stat.S_ISREG(self.lstat().st_mode)

    def islink(self):
        if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
            return self.dirent.d_type == DT_LNK
        else:
            return stat.S_ISLNK(self.lstat().st_mode)

Oh, and the .dirent would either be None (Windows) or would have
.d_type and .d_ino attributes (Linux, OS X).

This would make the scandir() API nice and simple to use for callers,
but still expose all the information the OS provides (both the
meaningful fields in dirent, and a full stat on Windows, nicely cached
in the DirEntry object).

Thoughts?

-Ben


More information about the Python-Dev mailing list