[Python-Dev] PEP 428: Pathlib -> stat caching

Nick Coghlan ncoghlan at gmail.com
Wed Sep 18 03:32:14 CEST 2013


On 18 September 2013 11:10, Philip Jenvey <pjenvey at underboss.org> wrote:
>
> On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote:
>
>> On Mon, 16 Sep 2013 15:48:54 -0400
>> Brett Cannon <brett at python.org> wrote:
>>>>
>>>> So I would like to propose the following API change:
>>>>
>>>> - Path.stat() (and stat-accessing methods such as get_mtime()...)
>>>>  returns an uncached stat object by default
>>>>
>>>> - Path.cache_stat() can be called to return the stat() *and* cache it
>>>>  for future use, such that any future call to stat(), cache_stat() or
>>>>  a stat-accessing function reuses that cached stat
>>>>
>>>> In other words, only if you use cache_stat() at least once is the
>>>> stat() value cached and reused by the Path object.
>>>> (also, it's a per-Path decision)
>>>>
>>>
>>> Any reason why stat() can't get a keyword-only cached=True argument
>>> instead? Or have stat() never cache() but stat_cache() always so that
>>> people can choose if they want fresh or cached based on API and not whether
>>> some library happened to make a decision for them?
>>
>> 1. Because you also want the helper functions (get_mtime(), etc.) to
>> cache the value too. It's not only about stat().
>
> With the proposed rich stat object the convenience methods living on Path wouldn't result in much added convenience:
>
> p.is_dir() vs p.stat().is_dir()
>
> Why not move these methods from Path to a rich stat obj and not cache stat results at all? It's easy enough for users to cache them themselves and much more explicit.

Because that doesn't help iterator based os.walk inspired APIs like
walkdir, which would benefit greatly from a path type with implicit
caching, but would have to complicate their APIs significantly to pass
around separate stat objects.

Rewriting walkdir to depend on pathlib has been on my todo list for a
while, as it solves a potentially serious walkdir performance problem
where chained iterators have to make repeated stat calls to answer
questions that were already asked by earlier iterators in the
pipeline.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list