[Python-ideas] PEP: Extended stat_result (First Draft)

Pieter Nagel pieter at nagel.co.za
Tue May 7 08:48:55 CEST 2013


On Mon, 2013-05-06 at 18:09 -0400, Jim Jewett wrote:

> Another alternative would be to modify os.path.isfile, os.path.isdir,
> etc so that they can accept a stat_result in place of a filename (or
> open-file handle).

Interesting proposal.

The downside is that it would place even more pressure on os.path to
accumulate all kinds of platform-specific things like os.path.isdoor
(for Solaris). Or, alternatively, that Python will never support these
things in order not to pollute os.path. With stat_result, one can
conceptually have platform-specific types os stat_result, and place
is_door only on solaris_stat_result.

The other downside is it overloads the meaning of the parameters even
more.

> Yet another (albeit more complicated, with questionable
> backwards-compatibility) alternative would be to have the os.path.*
> functions maintain a very short-duration cache, so that if the same
> file is queried multiple times within a second or so, the stat_result
> could be reused.

In a recent discussion on python-dev regarding similar propose behaviour
in PEP 428, Guido pronounced that in general, he wants APIs that cache
to also expose their uncached variants, because both use cases are
usually needed.

So this caching would need to be optional on os.path.*, implying yet
another parameter.

I'm -1 on this.

Also note that you might anyway end up getting something like this with
PEP 428. At the moment it has essentially infinite caching, not sure how
it will be changed after Guido's statement.

> Out of curiosity, is it common to call more than function, except in
> the following cases:

What about, for example, admin script code that walks a bunch of files
and wants to do different things to different types of files?

I.e.

  if os.path.isfile(f):
    # do something
  elif os.path.isdir(f):
    # do something
  elif os.path.islink(f):
    # do something
  #etc.

> 
> (1) stat.S_ISREG(st.st_mode) or stat.S_ISDIR(st.st_mode)
> (2) try all the type functions until successful
> 
> If those are the only real use cases, it might make sense to just add
> a pair of functions for those two specific cases.
> 
>     > if os.path.isfile_or_dir(filename)

I don't think this will cover all usecases.

> Or maybe just for the latter, with the first spelled either
> 
>     > from os.path import filekind
>     > if filekind(filename) in (filekind.REGULAR, filekind.DIR) #symlinks?
> 
> or
> 
>     > from os.path import filekind
>     > if filekind(filename) isinstance (filekind.REGULAR, filekind.DIR)

The basic notion has been proposed before.

I'm holding off on it, because:I don't think it'll ever be the *only*
mechanism for interrogating file types (we need to retain os.path.isfile
for backwards compatibility). I expect naive and newbie code to still
prefer os.path, and so this filekind notion will be just another way in
which performant code that calls stat() only once will look totally
different from naive code.

I'm in favour of it being a potential *additional* way to interrogate
file types, but I'm not going to champion it for now. This proposal will
need a lot of word to get the modelling of the filekinds correct, taking
into account questions like "is a fifo filekind a kind of file
filekind", and will need a survey of platform-specific stat flags that
Python may want to support in the near future.

> Even assuming these are added individually (as opposed to a single
> filekind), is there a reason not to make them properties?  I
> understand that a property normally shouldn't hide something as
> expensive as a system call, but in this case the system call is
> already complete before the caller has a stat_return with attributes.

I'll want to follow PEP 428's lead here, and both it (and os.path!)
currently have them as methods.

My next draft will make it clear why I consider PEP 428 relevant here.

> > same_stat(other)
> >     Equivalent to ``os.path.samestat(self, other)``.
> 
> Why is this not just an equality test?
> 
> Is there just too much  of a backward-compatibility problem for
> stat_result objects that refer to the same device/inode, but have
> differences in the way other attributes are set?

Equality, to my mind, implies that all visible state is being compared,
so if I were to add __eq__ to stat result, it will compare st_size,
st_mtime and the whole lot too. Anything else would be confusing.

Plus, imagine you call stat() on a file now, and store the result for
some reason. Later, you call stat() again, and want to see if any of the
old stat_results you stored refer to the same file. But meantime the
st_size etc. changed, even though the file itself is still "the same
file".

So this is a totally different operation that equality.

Agree the name same_stat() is not ideal. I think I should relax my
desire to use only names that echo os.path in cases where it makes
sense.

> 
> > format()
> >     This shall return ``stat.S_IFMT(self.st_mode)``.
> 
> I don't think this is important enough to justify the confusion with "".format

The next PEP will likely omit it entirely


-- 
Pieter Nagel





More information about the Python-ideas mailing list