[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

Ben Hoyt benhoyt at gmail.com
Sat May 11 06:24:48 CEST 2013


> Have you actually tried the code? It can't give you correct answers. The
> struct dirent.d_type member as returned by readdir() has different
> values than stat.st_mode's file type.

Yes, I'm quite aware of that. In the first version of BetterWalk
that's exactly how it did it, and this approach worked fine.
However...

> Or are you proposing to map d_type to st_mode?

Yes, that's exactly what I was proposing -- sorry if that wasn't clear.

> Hence I'm +1 on the general idea but -1 on something stat like. IMHO
> os.scandir() should yield four objects:
>
>  * name
>  * inode
>  * file type or DT_UNKNOWN
>  * stat_result or None

This feels quite heavy to me. And I don't like it how for the normal
case (checking whether something was a file or directory) you'd have
to check file_type against DT_UNKNOWN as well as stat_result against
None before doing anything with it:

for item in os.scandir():
    if item.file_type == DT_UNKNOWN and item.stat_result is None:
        # call os.stat()

I guess that's not *too* bad.

> That's also problematic because st_mode would only have file type
> bits, not permission bits.

You're right. However, given that scandir() is intended as a
low-level, OS-specific function, couldn't we just document this and
move on? Keep the API nice and simple and still cover 95% of the use
cases. How often does anyone actually iterate through a directory
doing stuff with the permission bits.

The nice thing about having it return a stat-like object is that in
almost all cases you don't have to have two different code paths
(d_type and st_mode), you just deal with st_mode. And we already have
the stat module for dealing with st_mode stuff, so we wouldn't need
another bunch of code/constants for dealing with d_type.

The documentation could just say something like:

"The exact information returned in st_mode is OS-specific. In
practice, on Windows it returns all the information that stat() does.
On Linux and OS X, it's either None or it includes the mode bits (but
not the permissions bits)."

Antoine said: "But what if some systems return more than the file type
and less than a
full stat result?"

Again, I just think that debating the very fine points like this to
get that last 5% of use cases will mean we never have this very useful
function in the library.

In all the *practical* examples I've seen (and written myself), I
iterate over a directory and I just need to know whether it's a file
or directory (or maybe a link). Occassionally you need the size as
well, but that would just mean a similar check "if st.st_size is None:
st = os.stat(...)", which on Linux/OS X would call stat(), but it'd
still be free and fast on Windows.

-Ben


More information about the Python-Dev mailing list