[Python-checkins] peps: Push PEP 428 - object-oriented filesystem paths
antoine.pitrou
python-checkins at python.org
Fri Oct 5 20:21:23 CEST 2012
http://hg.python.org/peps/rev/cd9ddbed7c8d
changeset: 4535:cd9ddbed7c8d
user: Antoine Pitrou <solipsis at pitrou.net>
date: Fri Oct 05 20:19:40 2012 +0200
summary:
Push PEP 428 - object-oriented filesystem paths
files:
pep-0428.txt | 568 +++++++++++++++++++++++++++++++++++++++
1 files changed, 568 insertions(+), 0 deletions(-)
diff --git a/pep-0428.txt b/pep-0428.txt
new file mode 100644
--- /dev/null
+++ b/pep-0428.txt
@@ -0,0 +1,568 @@
+PEP: 428
+Title: The pathlib module -- object-oriented filesystem paths
+Version: $Revision$
+Last-Modified: $Date
+Author: Antoine Pitrou <solipsis at pitrou.net>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 30-July-2012
+Python-Version: 3.4
+Post-History:
+
+
+Abstract
+========
+
+This PEP proposes the inclusion of a third-party module, `pathlib`_, in
+the standard library. The inclusion is proposed under the provisional
+label, as described in :pep:`411`. Therefore, API changes can be done,
+either as part of the PEP process, or after acceptance in the standard
+library (and until the provisional label is removed).
+
+The aim of this library is to provide a simple hierarchy of classes to
+handle filesystem paths and the common operations users do over them.
+
+.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
+
+
+Related work
+============
+
+An object-oriented API for filesystem paths has already been proposed
+and rejected in :pep:`355`. Several third-party implementations of the
+idea of object-oriented filesystem paths exist in the wild:
+
+* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
+ and others, which provides a ``str``-subclassing ``Path`` class;
+
+* Twisted's slightly specialized `FilePath class`_;
+
+* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
+ ``str``;
+
+* `Unipath`_, a variation on the str-subclassing approach with two public
+ classes, an ``AbstractPath`` class for operations which don't do I/O and a
+ ``Path`` class for all common operations.
+
+This proposal attempts to learn from these previous attempts and the
+rejection of :pep:`355`.
+
+
+.. _`path.py module`: https://github.com/jaraco/path.py
+.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
+.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
+.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
+
+
+Why an object-oriented API
+==========================
+
+The rationale to represent filesystem paths using dedicated classes is the
+same as for other kinds of stateless objects, such as dates, times or IP
+addresses. Python has been slowly moving away from strictly replicating
+the C language's APIs to providing better, more helpful abstractions around
+all kinds of common functionality. Even if this PEP isn't accepted, it is
+likely that another form of filesystem handling abstraction will be adopted
+one day into the standard library.
+
+Indeed, many people will prefer handling dates and times using the high-level
+objects provided by the ``datetime`` module, rather than using numeric
+timestamps and the ``time`` module API. Moreover, using a dedicated class
+allows to enable desirable behaviours by default, for example the case
+insensitivity of Windows paths.
+
+
+Proposal
+========
+
+Class hierarchy
+---------------
+
+The `pathlib`_ module implements a simple hierarchy of classes::
+
+ +----------+
+ | |
+ ---------| PurePath |--------
+ | | | |
+ | +----------+ |
+ | | |
+ | | |
+ v | v
+ +---------------+ | +------------+
+ | | | | |
+ | PurePosixPath | | | PureNTPath |
+ | | | | |
+ +---------------+ | +------------+
+ | v |
+ | +------+ |
+ | | | |
+ | -------| Path |------ |
+ | | | | | |
+ | | +------+ | |
+ | | | |
+ | | | |
+ v v v v
+ +-----------+ +--------+
+ | | | |
+ | PosixPath | | NTPath |
+ | | | |
+ +-----------+ +--------+
+
+
+This hierarchy divides path classes along two dimensions:
+
+* a path class can be either pure or concrete: pure classes support only
+ operations that don't need to do any actual I/O, which are most path
+ manipulation operations; concrete classes support all the operations
+ of pure classes, plus operations that do I/O.
+
+* a path class is of a given flavour according to the kind of operating
+ system paths it represents. `pathlib`_ implements two flavours: NT paths
+ for the filesystem semantics embodied in Windows systems, POSIX paths for
+ other systems (``os.name``'s terminology is re-used here).
+
+Any pure class can be instantiated on any system: for example, you can
+manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
+under Unix, and so on. However, concrete classes can only be instantiated
+on a matching system: indeed, it would be error-prone to start doing I/O
+with ``NTPath`` objects under Unix, or vice-versa.
+
+Furthermore, there are two base classes which also act as system-dependent
+factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
+``PureNTPath`` depending on the operating system. Similarly, ``Path``
+will instantiate either a ``PosixPath`` or a ``NTPath``.
+
+It is expected that, in most uses, using the ``Path`` class is adequate,
+which is why it has the shortest name of all.
+
+
+No confusion with builtins
+--------------------------
+
+In this proposal, the path classes do not derive from a builtin type. This
+contrasts with some other Path class proposals which were derived from
+``str``. They also do not pretend to implement the sequence protocol:
+if you want a path to act as a sequence, you have to lookup a dedicate
+attribute (the ``parts`` attribute).
+
+By avoiding to pass as builtin types, the path classes minimize the potential
+for confusion if they are combined by accident with genuine builtin types.
+
+
+Immutability
+------------
+
+Path objects are immutable, which makes them hashable and also prevents a
+class of programming errors.
+
+
+Sane behaviour
+--------------
+
+Little of the functionality from os.path is reused. Many os.path functions
+are tied by backwards compatibility to confusing or plain wrong behaviour
+(for example, the fact that ``os.path.abspath()`` simplifies ".." path
+components without resolving symlinks first).
+
+Also, using classes instead of plain strings helps make system-dependent
+behaviours natural. For example, comparing and ordering Windows path
+objects is case-insensitive, and path separators are automatically converted
+to the platform default.
+
+
+Useful notations
+----------------
+
+The API tries to provide useful notations all the while avoiding magic.
+Some examples::
+
+ >>> p = Path('/home/antoine/pathlib/setup.py')
+ >>> p.name
+ 'setup.py'
+ >>> p.ext
+ '.py'
+ >>> p.root
+ '/'
+ >>> p.parts
+ <PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
+ >>> list(p.parents())
+ [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
+ >>> p.exists()
+ True
+ >>> p.st_size
+ 928
+
+
+Pure paths API
+==============
+
+The philosophy of the ``PurePath`` API is to provide a consistent array of
+useful path manipulation operations, without exposing a hodge-podge of
+functions like ``os.path`` does.
+
+
+Definitions
+-----------
+
+First a couple of conventions:
+
+* All paths can have a drive and a root. For POSIX paths, the drive is
+ always empty.
+
+* A relative path has neither drive nor root.
+
+* A POSIX path is absolute if it has a root. A Windows path is absolute if
+ it has both a drive *and* a root. A Windows UNC path (e.g.
+ ``\\some\\share\\myfile.txt``) always has a drive and a root
+ (here, ``\\some\\share`` and ``\\``, respectively).
+
+* A drive which has either a drive *or* a root is said to be anchored.
+ Its anchor is the concatenation of the drive and root. Under POSIX,
+ "anchored" is the same as "absolute".
+
+
+Construction and joining
+------------------------
+
+We will present construction and joining together since they expose
+similar semantics.
+
+The simplest way to construct a path is to pass it its string representation::
+
+ >>> PurePath('setup.py')
+ PurePosixPath('setup.py')
+
+Extraneous path separators and ``"."`` components are eliminated::
+
+ >>> PurePath('a///b/c/./d/')
+ PurePosixPath('a/b/c/d')
+
+If you pass several arguments, they will be automatically joined::
+
+ >>> PurePath('docs', 'Makefile')
+ PurePosixPath('docs/Makefile')
+
+Joining semantics are similar to os.path.join, in that anchored paths ignore
+the information from the previously joined components::
+
+ >>> PurePath('/etc', '/usr', 'bin')
+ PurePosixPath('/usr/bin')
+
+However, with Windows paths, the drive is retained as necessary::
+
+ >>> PureNTPath('c:/foo', '/Windows')
+ PureNTPath('c:\\Windows')
+ >>> PureNTPath('c:/foo', 'd:')
+ PureNTPath('d:')
+
+Calling the constructor without any argument creates a path object pointing
+to the logical "current directory"::
+
+ >>> PurePosixPath()
+ PurePosixPath('.')
+
+A path can be joined with another using the ``__getitem__`` operator::
+
+ >>> p = PurePosixPath('foo')
+ >>> p['bar']
+ PurePosixPath('foo/bar')
+ >>> p[PurePosixPath('bar')]
+ PurePosixPath('foo/bar')
+
+As with constructing, multiple path components can be specified at once::
+
+ >>> p['bar/xyzzy']
+ PurePosixPath('foo/bar/xyzzy')
+
+A join() method is also provided, with the same behaviour. It can serve
+as a factory function::
+
+ >>> path_factory = p.join
+ >>> path_factory('bar')
+ PurePosixPath('foo/bar')
+
+
+Representing
+------------
+
+To represent a path (e.g. to pass it to third-party libraries), just call
+``str()`` on it::
+
+ >>> p = PurePath('/home/antoine/pathlib/setup.py')
+ >>> str(p)
+ '/home/antoine/pathlib/setup.py'
+ >>> p = PureNTPath('c:/windows')
+ >>> str(p)
+ 'c:\\windows'
+
+To force the string representation with forward slashes, use the ``as_posix()``
+method::
+
+ >>> p.as_posix()
+ 'c:/windows'
+
+To get the bytes representation (which might be useful under Unix systems),
+call ``bytes()`` on it, or use the ``as_bytes()`` method::
+
+ >>> bytes(p)
+ b'/home/antoine/pathlib/setup.py'
+
+
+Properties
+----------
+
+Five simple properties are provided on every path (each can be empty)::
+
+ >>> p = PureNTPath('c:/pathlib/setup.py')
+ >>> p.drive
+ 'c:'
+ >>> p.root
+ '\\'
+ >>> p.anchor
+ 'c:\\'
+ >>> p.name
+ 'setup.py'
+ >>> p.ext
+ '.py'
+
+
+Sequence-like access
+--------------------
+
+The ``parts`` property provides read-only sequence access to a path object::
+
+ >>> p = PurePosixPath('/etc/init.d')
+ >>> p.parts
+ <PurePosixPath.parts: ['/', 'etc', 'init.d']>
+
+Simple indexing returns the invidual path component as a string, while
+slicing returns a new path object constructed from the selected components::
+
+ >>> p.parts[-1]
+ 'init.d'
+ >>> p.parts[:-1]
+ PurePosixPath('/etc')
+
+Windows paths handle the drive and the root as a single path component::
+
+ >>> p = PureNTPath('c:/setup.py')
+ >>> p.parts
+ <PureNTPath.parts: ['c:\\', 'setup.py']>
+ >>> p.root
+ '\\'
+ >>> p.parts[0]
+ 'c:\\'
+
+(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
+
+The ``parent()`` method returns an ancestor of the path::
+
+ >>> p.parent()
+ PureNTPath('c:\\python33\\bin')
+ >>> p.parent(2)
+ PureNTPath('c:\\python33')
+ >>> p.parent(3)
+ PureNTPath('c:\\')
+
+The ``parents()`` method automates repeated invocations of ``parent()``, until
+the anchor is reached::
+
+ >>> p = PureNTPath('c:/python33/bin/python.exe')
+ >>> for parent in p.parents(): parent
+ ...
+ PureNTPath('c:\\python33\\bin')
+ PureNTPath('c:\\python33')
+ PureNTPath('c:\\')
+
+
+Querying
+--------
+
+``is_relative()`` returns True if the path is relative (see definition
+above), False otherwise.
+
+``is_reserved()`` returns True if a Windows path is a reserved path such
+as ``CON`` or ``NUL``. It always returns False for POSIX paths.
+
+``match()`` matches the path against a glob pattern::
+
+ >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
+ True
+
+``relative()`` returns a new relative path by stripping the drive and root::
+
+ >>> PurePosixPath('setup.py').relative()
+ PurePosixPath('setup.py')
+ >>> PurePosixPath('/setup.py').relative()
+ PurePosixPath('setup.py')
+
+``relative_to()`` computes the relative difference of a path to another::
+
+ >>> PurePosixPath('/usr/bin/python').relative_to('/usr')
+ PurePosixPath('bin/python')
+
+``normcase()`` returns a case-folded version of the path for NT paths::
+
+ >>> PurePosixPath('CAPS').normcase()
+ PurePosixPath('CAPS')
+ >>> PureNTPath('CAPS').normcase()
+ PureNTPath('caps')
+
+
+Concrete paths API
+==================
+
+In addition to the operations of the pure API, concrete paths provide
+additional methods which actually access the filesystem to query or mutate
+information.
+
+
+Constructing
+------------
+
+The classmethod ``cwd()`` creates a path object pointing to the current
+working directory in absolute form::
+
+ >>> Path.cwd()
+ PosixPath('/home/antoine/pathlib')
+
+
+File metadata
+-------------
+
+The ``stat()`` method caches and returns the file's stat() result;
+``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
+but doesn't have any caching behaviour::
+
+ >>> p.stat()
+ posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
+
+For ease of use, direct attribute access to the fields of the stat structure
+is provided over the path object itself::
+
+ >>> p.st_size
+ 928
+ >>> p.st_mtime
+ 1328287308.889562
+
+Higher-level methods help examine the kind of the file::
+
+ >>> p.exists()
+ True
+ >>> p.is_file()
+ True
+ >>> p.is_dir()
+ False
+ >>> p.is_symlink()
+ False
+
+The file owner and group names (rather than numeric ids) are queried
+through matching properties::
+
+ >>> p = Path('/etc/shadow')
+ >>> p.owner
+ 'root'
+ >>> p.group
+ 'shadow'
+
+
+Path resolution
+---------------
+
+The ``resolve()`` method makes a path absolute, resolving any symlink on
+the way. It is the only operation which will remove "``..``" path components.
+
+
+Directory walking
+-----------------
+
+Simple (non-recursive) directory access is done by iteration::
+
+ >>> p = Path('docs')
+ >>> for child in p: child
+ ...
+ PosixPath('docs/conf.py')
+ PosixPath('docs/_templates')
+ PosixPath('docs/make.bat')
+ PosixPath('docs/index.rst')
+ PosixPath('docs/_build')
+ PosixPath('docs/_static')
+ PosixPath('docs/Makefile')
+
+This allows simple filtering through list comprehensions::
+
+ >>> p = Path('.')
+ >>> [child for child in p if child.is_dir()]
+ [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
+
+Simple and recursive globbing is also provided::
+
+ >>> for child in p.glob('**/*.py'): child
+ ...
+ PosixPath('test_pathlib.py')
+ PosixPath('setup.py')
+ PosixPath('pathlib.py')
+ PosixPath('docs/conf.py')
+ PosixPath('build/lib/pathlib.py')
+
+
+File opening
+------------
+
+The ``open()`` method provides a file opening API similar to the builtin
+``open()`` method::
+
+ >>> p = Path('setup.py')
+ >>> with p.open() as f: f.readline()
+ ...
+ '#!/usr/bin/env python3\n'
+
+The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
+
+ >>> fd = p.raw_open(os.O_RDONLY)
+ >>> os.read(fd, 15)
+ b'#!/usr/bin/env '
+
+
+Filesystem alteration
+---------------------
+
+Several common filesystem operations are provided as methods: ``touch()``,
+``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
+``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
+provided, for example some of the functionality of the shutil module.
+
+
+Experimental openat() support
+-----------------------------
+
+On compatible POSIX systems, the concrete PosixPath class can take advantage
+of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
+open file descriptors as necessary. Support is enabled by passing the
+*use_openat* argument to the constructor::
+
+ >>> p = Path(".", use_openat=True)
+
+Then all paths constructed by navigating this path (either by iteration or
+indexing) will also use the openat() family of functions. The point of using
+these functions is to avoid race conditions whereby a given directory is
+silently replaced with another (often a symbolic link to a sensitive system
+location) between two accesses.
+
+.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
+
+
+Copyright
+=========
+
+This document has been placed into the public domain.
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list