[Python-Dev] Updates to PEP 471, the os.scandir() proposal

Ben Hoyt benhoyt at gmail.com
Wed Jul 9 15:22:41 CEST 2014


> Option 2:
> def log_err(exc):
>     logger.warn("Cannot stat {}".format(exc.filename))
>
> def get_tree_size(path):
>     total = 0
>     for entry in os.scandir(path, info='lstat', onerror=log_err):
>         if entry.is_dir:
>             total += get_tree_size(entry.full_name)
>         else:
>             total += entry.lstat.st_size
>     return total
>
> On this basis, #2 wins.

That's a pretty nice comparison, and you're right, onerror handling is
nicer here.

> However, I'm slightly uncomfortable using the
> filename attribute of the exception in the logging, as there is
> nothing in the docs saying that this will give a full pathname. I'd
> hate to see "Unable to stat __init__.py"!!!

Huh, you're right. I think this should be documented in os.walk() too.
I think it should be the full filename (is it currently?).

> So maybe the onerror function should also receive the DirEntry object
> - which will only have the name and full_name attributes, but that's
> all that is needed.

That's an interesting idea -- though enough of a deviation from
os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just
document that the onerror exception .filename is the full path name.

One issue with option #2 that I just realized -- does scandir yield
the entry at all if there's a stat error? It can't really, because the
caller will except the .lstat attribute to be set (assuming he asked
for type='lstat') but it won't be. Is effectively removing these
entries just because the stat failed a problem? I kind of think it is.
If so, is there a way to solve it with option #2?

> OK, looks like option #2 is now my preferred option. My gut instinct
> still rebels over an API that deliberately throws information away in
> the default case, even though there is now an option to ask it to keep
> that information, but I see the logic and can learn to live with it.

In terms of throwing away info "in the default case" -- it's simply a
case of getting what you ask for. :-) Worst case, you'll write your
code and test it, it'll fail hard on any system, you'll fix it
immediately, and then it'll work on any system.

-Ben


More information about the Python-Dev mailing list