[Python-checkins] peps: Push PEP 428 - object-oriented filesystem paths

antoine.pitrou python-checkins at python.org
Fri Oct 5 20:21:23 CEST 2012


http://hg.python.org/peps/rev/cd9ddbed7c8d
changeset:   4535:cd9ddbed7c8d
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Fri Oct 05 20:19:40 2012 +0200
summary:
  Push PEP 428 - object-oriented filesystem paths

files:
  pep-0428.txt |  568 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 568 insertions(+), 0 deletions(-)


diff --git a/pep-0428.txt b/pep-0428.txt
new file mode 100644
--- /dev/null
+++ b/pep-0428.txt
@@ -0,0 +1,568 @@
+PEP: 428
+Title: The pathlib module -- object-oriented filesystem paths
+Version: $Revision$
+Last-Modified: $Date
+Author: Antoine Pitrou <solipsis at pitrou.net>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 30-July-2012
+Python-Version: 3.4
+Post-History: 
+
+
+Abstract
+========
+
+This PEP proposes the inclusion of a third-party module, `pathlib`_, in
+the standard library.  The inclusion is proposed under the provisional
+label, as described in :pep:`411`.  Therefore, API changes can be done,
+either as part of the PEP process, or after acceptance in the standard
+library (and until the provisional label is removed).
+
+The aim of this library is to provide a simple hierarchy of classes to
+handle filesystem paths and the common operations users do over them.
+
+.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
+
+
+Related work
+============
+
+An object-oriented API for filesystem paths has already been proposed
+and rejected in :pep:`355`.  Several third-party implementations of the
+idea of object-oriented filesystem paths exist in the wild:
+
+* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
+  and others, which provides a ``str``-subclassing ``Path`` class;
+
+* Twisted's slightly specialized `FilePath class`_;
+
+* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
+  ``str``;
+
+* `Unipath`_, a variation on the str-subclassing approach with two public
+  classes, an ``AbstractPath`` class for operations which don't do I/O and a
+  ``Path`` class for all common operations.
+
+This proposal attempts to learn from these previous attempts and the
+rejection of :pep:`355`.
+
+
+.. _`path.py module`: https://github.com/jaraco/path.py
+.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
+.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
+.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
+
+
+Why an object-oriented API
+==========================
+
+The rationale to represent filesystem paths using dedicated classes is the
+same as for other kinds of stateless objects, such as dates, times or IP
+addresses.  Python has been slowly moving away from strictly replicating
+the C language's APIs to providing better, more helpful abstractions around
+all kinds of common functionality.  Even if this PEP isn't accepted, it is
+likely that another form of filesystem handling abstraction will be adopted
+one day into the standard library.
+
+Indeed, many people will prefer handling dates and times using the high-level
+objects provided by the ``datetime`` module, rather than using numeric
+timestamps and the ``time`` module API.  Moreover, using a dedicated class
+allows to enable desirable behaviours by default, for example the case
+insensitivity of Windows paths.
+
+
+Proposal
+========
+
+Class hierarchy
+---------------
+
+The `pathlib`_ module implements a simple hierarchy of classes::
+
+                           +----------+
+                           |          |
+                  ---------| PurePath |--------
+                  |        |          |       |
+                  |        +----------+       |
+                  |             |             |
+                  |             |             |
+                  v             |             v
+           +---------------+    |     +------------+
+           |               |    |     |            |
+           | PurePosixPath |    |     | PureNTPath |
+           |               |    |     |            |
+           +---------------+    |     +------------+
+                  |             v             |
+                  |          +------+         |
+                  |          |      |         |
+                  |   -------| Path |------   |
+                  |   |      |      |     |   |
+                  |   |      +------+     |   |
+                  |   |                   |   |
+                  |   |                   |   |
+                  v   v                   v   v
+             +-----------+              +--------+
+             |           |              |        |
+             | PosixPath |              | NTPath |
+             |           |              |        |
+             +-----------+              +--------+
+
+
+This hierarchy divides path classes along two dimensions:
+
+* a path class can be either pure or concrete: pure classes support only
+  operations that don't need to do any actual I/O, which are most path
+  manipulation operations; concrete classes support all the operations
+  of pure classes, plus operations that do I/O.
+
+* a path class is of a given flavour according to the kind of operating
+  system paths it represents.  `pathlib`_ implements two flavours: NT paths
+  for the filesystem semantics embodied in Windows systems, POSIX paths for
+  other systems (``os.name``'s terminology is re-used here).
+
+Any pure class can be instantiated on any system: for example, you can
+manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
+under Unix, and so on.  However, concrete classes can only be instantiated
+on a matching system: indeed, it would be error-prone to start doing I/O
+with ``NTPath`` objects under Unix, or vice-versa.
+
+Furthermore, there are two base classes which also act as system-dependent
+factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
+``PureNTPath`` depending on the operating system.  Similarly, ``Path``
+will instantiate either a ``PosixPath`` or a ``NTPath``.
+
+It is expected that, in most uses, using the ``Path`` class is adequate,
+which is why it has the shortest name of all.
+
+
+No confusion with builtins
+--------------------------
+
+In this proposal, the path classes do not derive from a builtin type.  This
+contrasts with some other Path class proposals which were derived from
+``str``.  They also do not pretend to implement the sequence protocol:
+if you want a path to act as a sequence, you have to lookup a dedicate
+attribute (the ``parts`` attribute).
+
+By avoiding to pass as builtin types, the path classes minimize the potential
+for confusion if they are combined by accident with genuine builtin types.
+
+
+Immutability
+------------
+
+Path objects are immutable, which makes them hashable and also prevents a
+class of programming errors.
+
+
+Sane behaviour
+--------------
+
+Little of the functionality from os.path is reused.  Many os.path functions
+are tied by backwards compatibility to confusing or plain wrong behaviour
+(for example, the fact that ``os.path.abspath()`` simplifies ".." path
+components without resolving symlinks first).
+
+Also, using classes instead of plain strings helps make system-dependent
+behaviours natural.  For example, comparing and ordering Windows path
+objects is case-insensitive, and path separators are automatically converted
+to the platform default.
+
+
+Useful notations
+----------------
+
+The API tries to provide useful notations all the while avoiding magic.
+Some examples::
+
+    >>> p = Path('/home/antoine/pathlib/setup.py')
+    >>> p.name
+    'setup.py'
+    >>> p.ext
+    '.py'
+    >>> p.root
+    '/'
+    >>> p.parts
+    <PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
+    >>> list(p.parents())
+    [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
+    >>> p.exists()
+    True
+    >>> p.st_size
+    928
+
+
+Pure paths API
+==============
+
+The philosophy of the ``PurePath`` API is to provide a consistent array of
+useful path manipulation operations, without exposing a hodge-podge of
+functions like ``os.path`` does.
+
+
+Definitions
+-----------
+
+First a couple of conventions:
+
+* All paths can have a drive and a root.  For POSIX paths, the drive is
+  always empty.
+
+* A relative path has neither drive nor root.
+
+* A POSIX path is absolute if it has a root.  A Windows path is absolute if
+  it has both a drive *and* a root.  A Windows UNC path (e.g.
+  ``\\some\\share\\myfile.txt``) always has a drive and a root
+  (here, ``\\some\\share`` and ``\\``, respectively).
+
+* A drive which has either a drive *or* a root is said to be anchored.
+  Its anchor is the concatenation of the drive and root.  Under POSIX,
+  "anchored" is the same as "absolute".
+
+
+Construction and joining
+------------------------
+
+We will present construction and joining together since they expose
+similar semantics.
+
+The simplest way to construct a path is to pass it its string representation::
+
+    >>> PurePath('setup.py')
+    PurePosixPath('setup.py')
+
+Extraneous path separators and ``"."`` components are eliminated::
+
+    >>> PurePath('a///b/c/./d/')
+    PurePosixPath('a/b/c/d')
+
+If you pass several arguments, they will be automatically joined::
+
+    >>> PurePath('docs', 'Makefile')
+    PurePosixPath('docs/Makefile')
+
+Joining semantics are similar to os.path.join, in that anchored paths ignore
+the information from the previously joined components::
+
+    >>> PurePath('/etc', '/usr', 'bin')
+    PurePosixPath('/usr/bin')
+
+However, with Windows paths, the drive is retained as necessary::
+
+    >>> PureNTPath('c:/foo', '/Windows')
+    PureNTPath('c:\\Windows')
+    >>> PureNTPath('c:/foo', 'd:')
+    PureNTPath('d:')
+
+Calling the constructor without any argument creates a path object pointing
+to the logical "current directory"::
+
+    >>> PurePosixPath()
+    PurePosixPath('.')
+
+A path can be joined with another using the ``__getitem__`` operator::
+
+    >>> p = PurePosixPath('foo')
+    >>> p['bar']
+    PurePosixPath('foo/bar')
+    >>> p[PurePosixPath('bar')]
+    PurePosixPath('foo/bar')
+
+As with constructing, multiple path components can be specified at once::
+
+    >>> p['bar/xyzzy']
+    PurePosixPath('foo/bar/xyzzy')
+
+A join() method is also provided, with the same behaviour.  It can serve
+as a factory function::
+
+    >>> path_factory = p.join
+    >>> path_factory('bar')
+    PurePosixPath('foo/bar')
+
+
+Representing
+------------
+
+To represent a path (e.g. to pass it to third-party libraries), just call
+``str()`` on it::
+
+    >>> p = PurePath('/home/antoine/pathlib/setup.py')
+    >>> str(p)
+    '/home/antoine/pathlib/setup.py'
+    >>> p = PureNTPath('c:/windows')
+    >>> str(p)
+    'c:\\windows'
+
+To force the string representation with forward slashes, use the ``as_posix()``
+method::
+
+    >>> p.as_posix()
+    'c:/windows'
+
+To get the bytes representation (which might be useful under Unix systems),
+call ``bytes()`` on it, or use the ``as_bytes()`` method::
+
+    >>> bytes(p)
+    b'/home/antoine/pathlib/setup.py'
+
+
+Properties
+----------
+
+Five simple properties are provided on every path (each can be empty)::
+
+    >>> p = PureNTPath('c:/pathlib/setup.py')
+    >>> p.drive
+    'c:'
+    >>> p.root
+    '\\'
+    >>> p.anchor
+    'c:\\'
+    >>> p.name
+    'setup.py'
+    >>> p.ext
+    '.py'
+
+
+Sequence-like access
+--------------------
+
+The ``parts`` property provides read-only sequence access to a path object::
+
+    >>> p = PurePosixPath('/etc/init.d')
+    >>> p.parts
+    <PurePosixPath.parts: ['/', 'etc', 'init.d']>
+
+Simple indexing returns the invidual path component as a string, while
+slicing returns a new path object constructed from the selected components::
+
+    >>> p.parts[-1]
+    'init.d'
+    >>> p.parts[:-1]
+    PurePosixPath('/etc')
+
+Windows paths handle the drive and the root as a single path component::
+
+    >>> p = PureNTPath('c:/setup.py')
+    >>> p.parts
+    <PureNTPath.parts: ['c:\\', 'setup.py']>
+    >>> p.root
+    '\\'
+    >>> p.parts[0]
+    'c:\\'
+
+(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
+
+The ``parent()`` method returns an ancestor of the path::
+
+    >>> p.parent()
+    PureNTPath('c:\\python33\\bin')
+    >>> p.parent(2)
+    PureNTPath('c:\\python33')
+    >>> p.parent(3)
+    PureNTPath('c:\\')
+
+The ``parents()`` method automates repeated invocations of ``parent()``, until
+the anchor is reached::
+
+    >>> p = PureNTPath('c:/python33/bin/python.exe')
+    >>> for parent in p.parents(): parent
+    ...
+    PureNTPath('c:\\python33\\bin')
+    PureNTPath('c:\\python33')
+    PureNTPath('c:\\')
+
+
+Querying
+--------
+
+``is_relative()`` returns True if the path is relative (see definition
+above), False otherwise.
+
+``is_reserved()`` returns True if a Windows path is a reserved path such
+as ``CON`` or ``NUL``.  It always returns False for POSIX paths.
+
+``match()`` matches the path against a glob pattern::
+
+    >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
+    True
+
+``relative()`` returns a new relative path by stripping the drive and root::
+
+    >>> PurePosixPath('setup.py').relative()
+    PurePosixPath('setup.py')
+    >>> PurePosixPath('/setup.py').relative()
+    PurePosixPath('setup.py')
+
+``relative_to()`` computes the relative difference of a path to another::
+
+    >>> PurePosixPath('/usr/bin/python').relative_to('/usr')
+    PurePosixPath('bin/python')
+
+``normcase()`` returns a case-folded version of the path for NT paths::
+
+    >>> PurePosixPath('CAPS').normcase()
+    PurePosixPath('CAPS')
+    >>> PureNTPath('CAPS').normcase()
+    PureNTPath('caps')
+
+
+Concrete paths API
+==================
+
+In addition to the operations of the pure API, concrete paths provide
+additional methods which actually access the filesystem to query or mutate
+information.
+
+
+Constructing
+------------
+
+The classmethod ``cwd()`` creates a path object pointing to the current
+working directory in absolute form::
+
+    >>> Path.cwd()
+    PosixPath('/home/antoine/pathlib')
+
+
+File metadata
+-------------
+
+The ``stat()`` method caches and returns the file's stat() result;
+``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
+but doesn't have any caching behaviour::
+
+    >>> p.stat()
+    posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
+
+For ease of use, direct attribute access to the fields of the stat structure
+is provided over the path object itself::
+
+    >>> p.st_size
+    928
+    >>> p.st_mtime
+    1328287308.889562
+
+Higher-level methods help examine the kind of the file::
+
+    >>> p.exists()
+    True
+    >>> p.is_file()
+    True
+    >>> p.is_dir()
+    False
+    >>> p.is_symlink()
+    False
+
+The file owner and group names (rather than numeric ids) are queried
+through matching properties::
+
+    >>> p = Path('/etc/shadow')
+    >>> p.owner
+    'root'
+    >>> p.group
+    'shadow'
+
+
+Path resolution
+---------------
+
+The ``resolve()`` method makes a path absolute, resolving any symlink on
+the way.  It is the only operation which will remove "``..``" path components.
+
+
+Directory walking
+-----------------
+
+Simple (non-recursive) directory access is done by iteration::
+
+    >>> p = Path('docs')
+    >>> for child in p: child
+    ...
+    PosixPath('docs/conf.py')
+    PosixPath('docs/_templates')
+    PosixPath('docs/make.bat')
+    PosixPath('docs/index.rst')
+    PosixPath('docs/_build')
+    PosixPath('docs/_static')
+    PosixPath('docs/Makefile')
+
+This allows simple filtering through list comprehensions::
+
+    >>> p = Path('.')
+    >>> [child for child in p if child.is_dir()]
+    [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
+
+Simple and recursive globbing is also provided::
+
+    >>> for child in p.glob('**/*.py'): child
+    ...
+    PosixPath('test_pathlib.py')
+    PosixPath('setup.py')
+    PosixPath('pathlib.py')
+    PosixPath('docs/conf.py')
+    PosixPath('build/lib/pathlib.py')
+
+
+File opening
+------------
+
+The ``open()`` method provides a file opening API similar to the builtin
+``open()`` method::
+
+    >>> p = Path('setup.py')
+    >>> with p.open() as f: f.readline()
+    ...
+    '#!/usr/bin/env python3\n'
+
+The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
+
+    >>> fd = p.raw_open(os.O_RDONLY)
+    >>> os.read(fd, 15)
+    b'#!/usr/bin/env '
+
+
+Filesystem alteration
+---------------------
+
+Several common filesystem operations are provided as methods: ``touch()``,
+``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
+``chmod()``, ``lchmod()``, ``symlink_to()``.  More operations could be
+provided, for example some of the functionality of the shutil module.
+
+
+Experimental openat() support
+-----------------------------
+
+On compatible POSIX systems, the concrete PosixPath class can take advantage
+of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
+open file descriptors as necessary.  Support is enabled by passing the
+*use_openat* argument to the constructor::
+
+    >>> p = Path(".", use_openat=True)
+
+Then all paths constructed by navigating this path (either by iteration or
+indexing) will also use the openat() family of functions.  The point of using
+these functions is to avoid race conditions whereby a given directory is
+silently replaced with another (often a symbolic link to a sensitive system
+location) between two accesses.
+
+.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
+
+
+Copyright
+=========
+
+This document has been placed into the public domain.
+
+
+..
+    Local Variables:
+    mode: indented-text
+    indent-tabs-mode: nil
+    sentence-end-double-space: t
+    fill-column: 70
+    coding: utf-8

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list