[Python-ideas] PEP: Extended stat_result (First Draft)

Pieter Nagel pieter at nagel.co.za
Mon May 6 10:30:04 CEST 2013


Following our discussion of last week, here is a first draft of the PEP

PEP: XXX
Title: Extended stat_result
Version: $Revision$
Last-Modified: $Date$
Author: Pieter Nagel <pieter at nagel.co.za>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 03-May-2013
Python-Version: 3.4


Abstract
========

This PEP proposes extending the result of ``os.stat()``, ``os.fstat()`` and
``os.lstat()`` calls with added methods such as ``is_file()``.  These added
methods will obviate the need to use the ``stat`` module to interpret the
result of these calls.


Motivation
==========

Currently, there are two different mechanisms for interrogating the file
types of filesystem paths, each with distinctly different appearance and
performance characteristics.

The first mechanism is a set of functions in the module ``os.path``, such
as ``os.path.isfile()`` and ``os.path.isdir()``.  These functions’ names
express their semantics relatively directly, but performance-wise each call
entails an ``os.stat()`` call, which could potentially be redundant if
another ``os.*stat()`` call had been done earlier for the same path in
order to query other similar properties of the path.

The second mechanism is by first calling  ``os.stat()``, ``os.fstat()`` or
``os.lstat()`` (henceforth collectively referred to as just
“``os.*stat()``”) for a particular path, and then interpreting the result
using functions in the ``stat`` module.  Performance-wise, these only
require a single ``os.*stat()`` call, no matter how many times different
properties of the result object are interrogated.  But the downside is that
the names of the functions needed to interrogate the result object, such as
``stat.S_ISREG()``, are relatively opaque, and motivated more by a desire
to conform to standards for the names of the underlying C macros than by a
desire to be semantically meaningful in English or to be Pythonic.

There are situations where the performance penalty of ``os.*stat()`` calls
can be significant enough to take into consideration.  For example, on some
networked filesystems they can be quite slow.  Another consideration is
that each call releases the GIL, which can also have negative performance
effects especially on multi-threaded code.

The end result of all this is that performance-agnostic code can be written
in a relatively straightforward way::

    if os.path.isfile(f) or os.path.isdir(f):
        # do something

Whereas in contrast, similar code that wishes to avoid the penalty of two
potential calls to ``os.stat()``, will look radically different::

    st = os.stat(f)
    if stat.S_ISREG(st.st_mode) or stat.S_ISDIR(st.st_mode):
        # do something

The cost is even worse if one takes into account that the second code
fragment still needs to take the nonexistence of ``f`` into account in
order to be completely semantically equivalent to the first, and also has
the extra cost of needing to import the ``stat`` module.

This PEP proposes ameliorating the situation by adding higher-level
predicates such as ``is_file()`` and ``is_dir()`` directly to the
``stat_result`` object, so that (assuming the file ``f`` exists) the second
code example can become::

    st = os.stat(f)
    if st.is_file() or st.is_dir():
        # do something


Specification
=============


Added methods on ``stat_result``
--------------------------------

is_dir()
    Equivalent to ``bool(stat.S_ISDIR(self.st_mode))``.

is_character_device()
    Equivalent to ``bool(stat.S_ISCHR(self.st_mode))``.

is_block_device()
    Equivalent to ``bool(stat.S_ISBLK(self.st_mode))``.

is_file()
    Equivalent to ``bool(stat.S_ISREG(self.st_mode))``.

is_fifo()
    Equivalent to ``bool(stat.S_ISFIFO(self.st_mode))``.

is_symbolic_link()
    Equivalent to ``bool(stat.S_ISLNK(self.st_mode))``.

is_socket()
    Equivalent to ``bool(stat.S_ISSOCK(self.st_mode)``.

same_stat(other)
    Equivalent to ``os.path.samestat(self, other)``.

file_mode()
    This shall return ``stat.filemode(stat.S_IMODE(self.st_mode))``, i.e. a
    string of the form ‘-rwxrwxrwx’.

permission_bits()
    This shall return ``stat.S_IMODE(self.st_mode)``.

format()
    This shall return ``stat.S_IFMT(self.st_mode)``.


Added functions in ``os.path``
------------------------------

is_dir(f)
    This shall be an alias for the existing isdir(f).

is_character_device(f)
    This shall return ``os.stat(f).is_character_device()``, or ``False`` if
    ``f`` does not exist.

is_block_device(f)
    This shall return ``os.stat(f).is_block_device()``, or ``False`` if
    ``f`` does not exist.

is_file()
    This shall be an alias for the existing isfile(f).

is_fifo()
    This shall return ``os.stat(f).is_fifo()``, or ``False`` if
    ``f`` does not exist.

is_symbolic_link()
    This shall return ``os.stat(f).is_symbolic_link()``, or ``False`` if
    ``f`` does not exist.

is_socket()
    This shall return ``os.stat(f).is_socket()``, or ``False`` if
    ``f`` does not exist.


Rationale
=========

The PEP is strongly motivated by a desire for symmetry between functions in
``os.path`` and methods on ``stat_result``.

Therefore, for each predicate function in ``os.path`` that is essentially
just an interrogation of ``os.*stat()``, given an existing path, the
similarly-named predicate method on ``stat_result`` should have the exact
same semantics.

This definition does not cover the case where the path being interrogated
does not exist.  In those cases, predicate functions in ``os.path``, such
as ``os.path.isfile()``, will return ``False``, whereas ``os.*stat()`` will
raise FileNotFoundError even before any ``stat_result`` is returned that
could have been interrogated.  This renders considerations of how the
proposed new predicates on ``stat_result`` could have been symmetrical with
functions in ``os.path``, if their ``stat_result`` had existed, moot, and
this PEP does not propose doing anything about the situation (but see `Open
Issues`_ below).

Secondly, this definition refers to ‘similarly-named’ predicates instead of
‘identically-named’ predicates, because the names in ``os.path`` pre-date
PEP 8 [#PEP-8]_, and are not compliant with it.  This PEP takes the
position that it is better that the new predicate methods on
``stat_result`` be named in compliance with PEP 8 [#PEP-8]_ (i.e. 
``is_file()``), than that they be precisely identical to the names in
``os.path`` (i.e ``isfile()``).  Note also that PEP 428 [#PEP-428]_ also
specifies PEP-8 compliant names such as ``is_file()`` for the exact same
concepts, and if PEP 428 [#PEP-428]_ should be accepted, the issue would be
even more pertinent.

Lastly, this PEP takes the notion of symmetry as far as adding methods and
aliases to the existing ``os.path`` in order to be symmetrical with the
added behaviour on ``stat_result``.  But the author is least strongly
convicted of this latter point, and may be convinced to abandon it.


Backwards Compatibility
=======================

This PEP neither removes current behavior of ``stat_result``, nor changes
the semantics of any current behavior.  Likewise, it adds functions and
aliases for functions to ``os.path``, but does not remove or change any
existing ones.

Therefore, this PEP should not cause any backwards incompatibilities,
except in the rare and esoteric cases where code is dependent on the
*nonexistence* of the proposed new names.  It is not deemed important
remain compatible with code that mistakenly holds the Python Standard
Library to be closed for new additions.


Open Issues
===========

Whether it is more desirable for the proposed added methods’ names to
follow PEP 8 [#PEP-8]_ (i.e.  ``is_file()`` etc.), or to mirror the
pre-existing names in ``os.path`` (i.e.  ``isfile()`` etc.) is still open
for debate.

The existing attributes on ``stat_result`` follow the pattern ``st_*`` in
conformance to the relevant POSIX names for the fields of the C-level
``stat`` structure.  The new names for the behaviours proposed here do not
contain such an ``st_`` prefix (nor could they, for that would suggest a
conformance with ``stat`` structure names which do not exist in POSIX). 
But the resulting asymmetry of names is annoying.  Should aliases for the
existing ``st_*`` names be added that omit the ``st_`` prefix?

This PEP does not address a higher-lever mechanism for exposing the
owner/group/other read/write/execute permissions.  Is there a need for
this?

This PEP does not address a higher-lever mechanism for exposing the of the
underlying ``st_flags`` field.  Is there a need for this?

This PEP proposes aliases and methods to make ``os.path`` conform more to
the added ``stat_result`` methods proposed here.  But is the impedance
mismatch between ``isfile`` and ``is_file`` really that much of an issue to
warrant this?

As it stands, this PEP does not address the asymmetry between the existing
``os.path.isfile()`` etc.  functions and the new proposed mechanism in the
case where the underlying file does not exist.  There is a way to handle
this, though: an optional flag could be added to ``os.*stat()`` that would
return a null object implementation of ``stat_result`` whenever the file
does not exist.  Then that null object could return ``False`` to
``is_file()`` etc., That means that the following code would behave
identically, even when the file ``f`` does not exist::

    if os.path.isfile(f) or os.path.isdir(f):
        # do something

    st = os.stat(f, null_if_missing=True)
    if st.is_file() or st.is_dir():
        # do something

Would this be a useful mechanism?


Rejected Proposals
==================

It has been proposed [#filetype]_ that a mechanism be added whereby
``stat_result`` could return some sort of type code identifying the file
type.  Originally these type codes were proposed as strings such as 'reg',
'dir', and the like, but others suggested enumerations instead.  The author
rejected that proposal to keep the current PEP focused on ameliorating
existing asymmetries rather than adding new behavior, but is not opposed to
the notion in principle (assuming enums are used instead of strings). 
Experience with creating the reference implementation for this PEP may yet
change the author's mind.

Concerns have been raised [#isdoor]_ about platform-specific stat flags
(such as S_ISDOOR on Solaris) that Python does not currently support, and
which could be added as part of this proposal.  The author has rejected
such proposals, yet again in order to keep the PEP focused.  The author
may, yet again, be persuaded otherwise.


References
==========

.. [#PEP-8] PEP 8,  Style Guide for Python Code , Rossum, Warsaw
   (http://www.python.org/dev/peps/pep-0008)

.. [#PEP-428] PEP 428,  The pathlib module -- object-oriented filesystem paths ,
   Pitrou (http://www.python.org/dev/peps/pep-0428)

.. [#filetype] http://mail.python.org/pipermail/python-ideas/2013-May/020378.html

.. [#isdoor] http://mail.python.org/pipermail/python-ideas/2013-May/020378.html


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

-- 
Pieter Nagel





More information about the Python-ideas mailing list