[Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)

Antoine Pitrou solipsis at pitrou.net
Mon Nov 25 00:13:05 CET 2013


On Mon, 25 Nov 2013 12:04:28 +1300
Ben Hoyt <benhoyt at gmail.com> wrote:
> > Right now, pathlib doesn't cache. Guido decided it was safer to start
> > off like that, and perhaps later we can add some optional caching.
> >
> > One reason caching didn't go in is that it's not clear which API is
> > best. Working on pluggin scandir() into pathlib would actually help
> > choosing a stat-caching API.
> >
> > (or, rather, lstat-caching...)
> >
> >> The other related thing is that DirEntry only provides .lstat(),
> >> because it's providing stat-like info without following links.
> >
> > Path.is_dir() and friends use stat(), i.e. they inform you about
> > whether a symlink's target is a directory (not the symlink itself).  Of
> > course, if the DirEntry says the path is a symlink, Path.is_dir() could
> > then run stat() to find out about the target.
> >
> > Do you plan to propose scandir() for inclusion in the stdlib?
> 
> Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry
> objects" for inclusion into the stdlib, and also speed up os.walk() as
> a result.
> 
> However, pathlib's API with .is_dir() and .lstat() etc are so close to
> DirEntry, I'd be much keener to roll up the scandir functionality into
> pathlib's iterdir(), as that's already going in the standard library,
> and iterdir() already returns Path objects.

We could still expose scandir() as a low-level API, *and* call it in
pathlib for optimizations.

> We could do Path.lstat(cached=True), but we'd also really want
> is_dir(cached=True), so that API kinda sucks. Alternatively you could
> have iterdir(cached=True) return PathWithCachedStat style objects --
> probably better, but kinda messy.

Perhaps Path.enable_caching()? It would enable caching not only on this
path objects, but all objects constructed from it.

Regards

Antoine.


More information about the Python-Dev mailing list