[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

Gregory P. Smith greg at krypto.org
Tue May 14 08:39:21 CEST 2013


On Sun, May 12, 2013 at 3:04 PM, Ben Hoyt <benhoyt at gmail.com> wrote:

> > And if we're creating a custom object instead, why return a 2-tuple
> > rather than making the entry's name an attribute of the custom object?
> >
> > To me, that suggests a more reasonable API for os.scandir() might be
> > for it to be an iterator over "dir_entry" objects:
> >
> >     name (as a string)
> >     is_file()
> >     is_dir()
> >     is_link()
> >     stat()
> >     cached_stat (None or a stat object)
>
> Nice! I really like your basic idea of returning a custom object
> instead of a 2-tuple. And I agree with Christian that .stat() would be
> clearer called .lstat(). I also like your later idea of simply
> exposing .dirent (would be None on Windows).
>
> One tweak I'd suggest is that is_file() etc be called isfile() etc
> without the underscore, to match the naming of the os.path.is*
> functions.
>
> > That would actually make sense at an implementation
> > level anyway - is_file() etc would check self.cached_lstat first, and
> > if that was None they would check self.dirent, and if that was also
> > None they would raise an error.
>
> Hmm, I'm not sure about this at all. Are you suggesting that the
> DirEntry object's is* functions would raise an error if both
> cached_lstat and dirent were None? Wouldn't it make for a much simpler
> API to just call os.lstat() and populate cached_lstat instead? As far
> as I'm concerned, that'd be the point of making DirEntry.lstat() a
> function.
>
> In fact, I don't think .cached_lstat should be exposed to the user.
> They just call entry.lstat(), and it returns a cached stat or calls
> os.lstat() to get the real stat if required (and populates the
> internal cached stat value). And the entry.is* functions would call
> entry.lstat() if dirent was or d_type was DT_UNKNOWN. This would
> change relatively nasty code like this:
>
> files = []
> dirs = []
> for entry in os.scandir(path):
>     try:
>         isdir = entry.isdir()
>     except NotPresentError:
>         st = os.lstat(os.path.join(path, entry.name))
>         isdir = stat.S_ISDIR(st)
>     if isdir:
>         dirs.append(entry.name)
>     else:
>         files.append(entry.name)
>
> Into nice clean code like this:
>
> files = []
> dirs = []
> for entry in os.scandir(path):
>     if entry.isfile():
>         dirs.append(entry.name)
>     else:
>         files.append(entry.name)
>
> This change would make scandir() usable by ordinary mortals, rather
> than just hardcore library implementors.
>
> In other words, I'm proposing that the DirEntry objects yielded by
> scandir() would have .name and .dirent attributes, and .isdir(),
> .isfile(), .islink(), .lstat() methods, and look basically like this
> (though presumably implemented in C):
>
> class DirEntry:
>     def __init__(self, name, dirent, lstat, path='.'):
>         # User shouldn't need to call this, but called internally by
> scandir()
>         self.name = name
>         self.dirent = dirent
>         self._lstat = lstat  # non-public attributes
>         self._path = path
>
>     def lstat(self):
>         if self._lstat is None:
>             self._lstat = os.lstat(os.path.join(self._path, self.name))
>         return self._lstat
>
>     def isdir(self):
>         if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
>             return self.dirent.d_type == DT_DIR
>         else:
>             return stat.S_ISDIR(self.lstat().st_mode)
>
>     def isfile(self):
>         if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
>             return self.dirent.d_type == DT_REG
>         else:
>             return stat.S_ISREG(self.lstat().st_mode)
>
>     def islink(self):
>         if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
>             return self.dirent.d_type == DT_LNK
>         else:
>             return stat.S_ISLNK(self.lstat().st_mode)
>
> Oh, and the .dirent would either be None (Windows) or would have
> .d_type and .d_ino attributes (Linux, OS X).
>
> This would make the scandir() API nice and simple to use for callers,
> but still expose all the information the OS provides (both the
> meaningful fields in dirent, and a full stat on Windows, nicely cached
> in the DirEntry object).
>
> Thoughts?
>

I like the sound of this (which sounds like what you've implemented now
though I haven't looked at your code).

-gps


>
> -Ben
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130513/cebbba6a/attachment.html>


More information about the Python-Dev mailing list