[Python-checkins] peps: Add PEP 402

nick.coghlan python-checkins at python.org
Wed Jul 20 10:43:53 CEST 2011


http://hg.python.org/peps/rev/f1032cec47ce
changeset:   3902:f1032cec47ce
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Wed Jul 20 18:43:40 2011 +1000
summary:
  Add PEP 402

files:
  pep-0402.txt |  585 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 585 insertions(+), 0 deletions(-)


diff --git a/pep-0402.txt b/pep-0402.txt
new file mode 100644
--- /dev/null
+++ b/pep-0402.txt
@@ -0,0 +1,585 @@
+PEP: 402
+Title: Simplified Package Layout and Partitioning
+Version: $Revision$
+Last-Modified: $Date$
+Author: P.J. Eby
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 12-Jul-2011
+Python-Version: 3.3
+Post-History: 20-Jul-2011
+Replaces: 382
+
+Abstract
+========
+
+This PEP proposes an enhancement to Python's package importing
+to:
+
+* Surprise users of other languages less,
+* Make it easier to convert a module into a package, and
+* Support dividing packages into separately installed components
+  (ala "namespace packages", as described in PEP 382)
+
+The proposed enhancements do not change the semantics of any
+currently-importable directory layouts, but make it possible for
+packages to use a simplified directory layout (that is not importable
+currently).
+
+However, the proposed changes do NOT add any performance overhead to
+the importing of existing modules or packages, and performance for the
+new directory layout should be about the same as that of previous
+"namespace package" solutions (such as ``pkgutil.extend_path()``).
+
+
+The Problem
+===========
+
+.. epigraph::
+
+   "Most packages are like modules.  Their contents are highly
+   interdependent and can't be pulled apart.  [However,] some
+   packages exist to provide a separate namespace. ...  It should
+   be possible to distribute sub-packages or submodules of these
+   [namespace packages] independently."
+
+   -- Jim Fulton, shortly before the release of Python 2.3 [1]_
+
+
+When new users come to Python from other languages, they are often
+confused by Python's packaging semantics.  At Google, for example,
+Guido received complaints from "a large crowd with pitchforks" [2]_
+that the requirement for packages to contain an ``__init__`` module
+was a "misfeature", and should be dropped.
+
+In addition, users coming from languages like Java or Perl are
+sometimes confused by a difference in Python's import path searching.
+
+In most other languages that have a similar path mechanism to Python's
+``sys.path``, a package is merely a namespace that contains modules
+or classes, and can thus be spread across multiple directories in
+the language's path.  In Perl, for instance, a ``Foo::Bar`` module
+will be searched for in ``Foo/`` subdirectories all along the module
+include path, not just in the first such subdirectory found.
+
+Worse, this is not just a problem for new users: it prevents *anyone*
+from easily splitting a package into separately-installable
+components.  In Perl terms, it would be as if every possible ``Net::``
+module on CPAN had to be bundled up and shipped in a single tarball!
+
+For that reason, various workarounds for this latter limitation exist,
+circulated under the term "namespace packages".  The Python standard
+library has provided one such workaround since Python 2.3 (via the
+``pkgutil.extend_path()`` function), and the "setuptools" package
+provides another (via ``pkg_resources.declare_namespace()``).
+
+The workarounds themselves, however, fall prey to a *third* issue with
+Python's way of laying out packages in the filesystem.
+
+Because a package *must* contain an ``__init__`` module, any attempt
+to distribute modules for that package must necessarily include that
+``__init__`` module, if those modules are to be importable.
+
+However, the very fact that each distribution of modules for a package
+must contain this (duplicated) ``__init__`` module, means that OS
+vendors who package up these module distributions must somehow handle
+the conflict caused by several module distributions installing that
+``__init__`` module to the same location in the filesystem.
+
+This led to the proposing of PEP 382 ("Namespace Packages") - a way
+to signal to Python's import machinery that a directory was
+importable, using unique filenames per module distribution.
+
+However, there was more than one downside to this approach.
+Performance for all import operations would be affected, and the
+process of designating a package became even more complex.  New
+terminology had to be invented to explain the solution, and so on.
+
+As terminology discussions continued on the Import-SIG, it soon became
+apparent that the main reason it was so difficult to explain the
+concepts related to "namespace packages" was because Python's
+current way of handling packages is somewhat underpowered, when
+compared to other languages.
+
+That is, in other popular languages with package systems, no special
+term is needed to describe "namespace packages", because *all*
+packages generally behave in the desired fashion.
+
+Rather than being an isolated single directory with a special marker
+module (as in Python), packages in other languages are typically just
+the *union* of appropriately-named directories across the *entire*
+import or inclusion path.
+
+In Perl, for example, the module ``Foo`` is always found in a
+``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a
+``Foo/Bar.pm`` file.  (In other words, there is One Obvious Way to
+find the location of a particular module.)
+
+This is because Perl considers a module to be *different* from a
+package: the package is purely a *namespace* in which other modules
+may reside, and is only *coincidentally* the name of a module as well.
+
+In current versions of Python, however, the module and the package are
+more tightly bound together.  ``Foo`` is always a module -- whether it
+is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly
+linked to its submodules (if any), which *must* reside in the exact
+same directory where the ``__init__.py`` was found.
+
+On the positive side, this design choice means that a package is quite
+self-contained, and can be installed, copied, etc. as a unit just by
+performing an operation on the package's root directory.
+
+On the negative side, however, it is non-intuitive for beginners, and
+requires a more complex step to turn a module into a package.  If
+``Foo`` begins its life as ``Foo.py``, then it must be moved and
+renamed to ``Foo/__init__.py``.
+
+Conversely, if you intend to create a ``Foo.Bar`` module from the
+start, but have no particular module contents to put in ``Foo``
+itself, then you have to create an empty and seemingly-irrelevant
+``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported.
+
+(And these issues don't just confuse newcomers to the language,
+either: they annoy many experienced developers as well.)
+
+So, after some discussion on the Import-SIG, this PEP was created
+as an alternative to PEP \382, in an attempt to solve *all* of the
+above problems, not just the "namespace package" use cases.
+
+And, as a delightful side effect, the solution proposed in this PEP
+does not affect the import performance of ordinary modules or
+self-contained (i.e. ``__init__``-based) packages.
+
+
+The Solution
+============
+
+In the past, various proposals have been made to allow more intuitive
+approaches to package directory layout.  However, most of them failed
+because of an apparent backward-compatibility problem.
+
+That is, if the requirement for an ``__init__`` module were simply
+dropped, it would open up the possibility for a directory named, say,
+``string`` on ``sys.path``, to block importing of the standard library
+``string`` module.
+
+Paradoxically, however, the failure of this approach does *not* arise
+from the elimination of the ``__init__`` requirement!
+
+Rather, the failure arises because the underlying approach takes for
+granted that a package is just ONE thing, instead of two.
+
+In truth, a package comprises two separate, but related entities: a
+module (with its own, optional contents), and a *namespace* where
+*other* modules or packages can be found.
+
+In current versions of Python, however, the module part (found in
+``__init__``) and the namespace for submodule imports (represented
+by the ``__path__`` attribute) are both initialized at the same time,
+when the package is first imported.
+
+And, if you assume this is the *only* way to initialize these two
+things, then there is no way to drop the need for an ``__init__``
+module, while still being backwards-compatible with existing directory
+layouts.
+
+After all, as soon as you encounter a directory on ``sys.path``
+matching the desired name, that means you've "found" the package, and
+must stop searching, right?
+
+Well, not quite.
+
+
+A Thought Experiment
+--------------------
+
+Let's hop into the time machine for a moment, and pretend we're back
+in the early 1990s, shortly before Python packages and ``__init__.py``
+have been invented.  But, imagine that we *are* familiar with
+Perl-like package imports, and we want to implement a similar system
+in Python.
+
+We'd still have Python's *module* imports to build on, so we could
+certainly conceive of having ``Foo.py`` as a parent ``Foo`` module
+for a ``Foo`` package.  But how would we implement submodule and
+subpackage imports?
+
+Well, if we didn't have the idea of ``__path__`` attributes yet,
+we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``.
+
+But we'd *only* do it when someone actually tried to *import*
+``Foo.Bar``.
+
+NOT when they imported ``Foo``.
+
+And *that* lets us get rid of the backwards-compatibility problem
+of dropping the ``__init__`` requirement, back here in 2011.
+
+How?
+
+Well, when we ``import Foo``, we're not even *looking* for ``Foo/``
+directories on ``sys.path``, because we don't *care* yet.  The only
+point at which we care, is the point when somebody tries to actually
+import a submodule or subpackage of ``Foo``.
+
+That means that if ``Foo`` is a standard library module (for example),
+and I happen to have a ``Foo`` directory on ``sys.path`` (without
+an ``__init__.py``, of course), then *nothing breaks*.  The ``Foo``
+module is still just a module, and it's still imported normally.
+
+
+Self-Contained vs. "Virtual" Packages
+-------------------------------------
+
+Of course, in today's Python, trying to ``import Foo.Bar`` will
+fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a
+``__path__`` attribute).
+
+So, this PEP proposes to *dynamically* create a ``__path__``, in the
+case where one is missing.
+
+That is, if I try to ``import Foo.Bar`` the proposed change to the
+import machinery will notice that the ``Foo`` module lacks a
+``__path__``, and will therefore try to *build* one before proceeding.
+
+And it will do this by making a list of all the existing ``Foo/``
+subdirectories of the directories listed in ``sys.path``.
+
+If the list is empty, the import will fail with ``ImportError``, just
+like today.  But if the list is *not* empty, then it is saved in
+a new ``Foo.__path__`` attribute, making the module a "virtual
+package".
+
+That is, because it now has a valid ``__path__``, we can proceed
+to import submodules or subpackages in the normal way.
+
+Now, notice that this change does not affect "classic", self-contained
+packages that have an ``__init__`` module in them.  Such packages
+already *have* a ``__path__`` attribute (initialized at import time)
+so the import machinery won't try to create another one later.
+
+This means that (for example) the standard library ``email`` package
+will not be affected in any way by you having a bunch of unrelated
+directories named ``email`` on ``sys.path``.  (Even if they contain
+``*.py`` files.)
+
+But it *does* mean that if you want to turn your ``Foo`` module into
+a ``Foo`` package, all you have to do is add a ``Foo/`` directory
+somewhere on ``sys.path``, and start adding modules to it.
+
+But what if you only want a "namespace package"?  That is, a package
+that is *only* a namespace for various separately-distributed
+submodules and subpackages?
+
+For example, if you're Zope Corporation, distributing dozens of
+separate tools like ``zc.buildout``, each in packages under the ``zc``
+namespace, you don't want to have to make and include an empty
+``zc.py`` in every tool you ship.  (And, if you're a Linux or other
+OS vendor, you don't want to deal with the package installation
+conflicts created by trying to install ten copies of ``zc.py`` to the
+same location!)
+
+No problem.  All we have to do is make one more minor tweak to the
+import process: if the "classic" import process fails to find a
+self-contained module or package (e.g., if ``import zc`` fails to find
+a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a
+``__path__`` by searching for all the ``zc/`` directories on
+``sys.path``, and putting them in a list.
+
+If this list is empty, we raise ``ImportError``.  But if it's
+non-empty, we create an empty ``zc`` module, and put the list in
+``zc.__path__``.  Congratulations: ``zc`` is now a namespace-only,
+"pure virtual" package!  It has no module contents, but you can still
+import submodules and subpackages from it, regardless of where they're
+located on ``sys.path``.
+
+(By the way, both of these additions to the import protocol (i.e. the
+dynamically-added ``__path__``, and dynamically-created modules)
+apply recursively to child packages, using the parent package's
+``__path__`` in place of ``sys.path`` as a basis for generating a
+child ``__path__``.  This means that self-contained and virtual
+packages can contain each other without limitation, with the caveat
+that if you put a virtual package inside a self-contained one, it's
+gonna have a really short ``__path__``!)
+
+
+Backwards Compatibility and Performance
+---------------------------------------
+
+Notice that these two changes *only* affect import operations that
+today would result in ``ImportError``.  As a result, the performance
+of imports that do not involve virtual packages is unaffected, and
+potential backward compatibility issues are very restricted.
+
+Today, if you try to import submodules or subpackages from a module
+with no ``__path__``, it's an immediate error.  And of course, if you
+don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path``
+today, ``import zc`` would likewise fail.
+
+Thus, the only potential backwards-compatibility issues are:
+
+1. Tools that expect package directories to have an ``__init__``
+   module, that expect directories without an ``__init__`` module
+   to be unimportable, or that expect ``__path__`` attributes to be
+   static, will not recognize virtual packages as packages.
+
+   (In practice, this just means that tools will need updating to
+   support virtual packages, e.g. by using ``pkgutil.walk_modules()``
+   instead of using hardcoded filesystem searches.)
+
+2. Code that *expects* certain imports to fail may now do something
+   unexpected.  This should be fairly rare in practice, as most sane,
+   non-test code does not import things that are expected not to
+   exist!
+
+The biggest likely exception to the above would be when a piece of
+code tries to check whether some package is installed by importing
+it.  If this is done *only* by importing a top-level module (i.e., not
+checking for a ``__version__`` or some other attribute), *and* there
+is a directory of the same name as the sought-for package on
+``sys.path`` somewhere, *and* the package is not actually installed,
+then such code could *perhaps* be fooled into thinking a package is
+installed that really isn't.
+
+However, even in the rare case where all these conditions line up to
+happen at once, the failure is more likely to be annoying than
+damaging.  In most cases, after all, the code will simply fail a
+little later on, when it actually tries to DO something with the
+imported (but empty) module.  (And code that checks ``__version__``
+attributes or for the presence of some desired function, class, or
+module in the package will not see a false positive result in the
+first place.)
+
+Meanwhile, tools that expect to locate packages and modules by
+walking a directory tree can be updated to use the existing
+``pkgutil.walk_modules()`` API, and tools that need to inspect
+packages in memory should use the other APIs described in the
+`Standard Library Changes/Additions`_ section below.
+
+
+Specification
+=============
+
+Two changes are made to the existing import process.
+
+First, the built-in ``__import__`` function must not raise an
+``ImportError`` when importing a submodule of a module with no
+``__path__``.  Instead, it must attempt to *create* a ``__path__``
+attribute for the parent module first, as described in `__path__
+creation`_, below.
+
+Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
+package ``__path__``) fails to find a module being imported, the
+import process must attempt to create a ``__path__`` attribute for
+the missing module.  If the attempt succeeds, an empty module is
+created and its ``__path__`` is set.  Otherwise, importing fails.
+
+In both of the above cases, if a non-empty ``__path__`` is created,
+the name of the module whose ``__path__`` was created is added to
+``sys.virtual_packages`` -- an initially-empty ``set()`` of package
+names.
+
+(This way, code that extends ``sys.path`` at runtime can find out
+what virtual packages are currently imported, and thereby add any
+new subdirectories to those packages' ``__path__`` attributes.  See
+`Standard Library Changes/Additions`_ below for more details.)
+
+Conversely, if an empty ``__path__`` results, an ``ImportError``
+is immediately raised, and the module is not created or changed, nor
+is its name added to ``sys.virtual_packages``.
+
+
+``__path__`` Creation
+---------------------
+
+A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
+object for each of the path entries found in ``sys.path`` (for a
+top-level module) or the parent ``__path__`` (for a submodule).
+
+(Note: because ``sys.meta_path`` importers are not associated with
+``sys.path`` or ``__path__`` entry strings, such importers do *not*
+participate in this process.)
+
+Each importer is checked for a ``get_subpath()`` method, and if
+present, the method is called with the full name of the module/package
+the ``__path__`` is being constructed for.  The return value is either
+a string representing a subdirectory for the requested package, or
+``None`` if no such subdirectory exists.
+
+The strings returned by the importers are added to the ``__path__``
+being built, in the same order as they are found.  (``None`` values
+and missing ``get_subpath()`` methods are simply skipped.)
+
+In Python code, the algorithm would look something like this::
+
+   def get_virtual_path(modulename, parent_path=None):
+
+       if parent_path is None:
+           parent_path = sys.path
+
+       path = []
+
+       for entry in parent_path:
+           # Obtain a PEP 302 importer object - see pkgutil module
+           importer = pkgutil.get_importer(entry)
+
+           if hasattr(importer, 'get_subpath'):
+               subpath = importer.get_subpath(modulename)
+               if subpath is not None:
+                   path.append(subpath)
+
+       return path
+
+And a function like this one should be exposed in the standard
+library as e.g. ``imp.get_virtual_path()``, so that people creating
+``__import__`` replacements or ``sys.meta_path`` hooks can reuse it.
+
+
+Standard Library Changes/Additions
+----------------------------------
+
+The ``pkgutil`` module should be updated to handle this
+specification appropriately, including any necessary changes to
+``extend_path()``, ``iter_modules()``, etc.
+
+Specifically the proposed changes and additions to ``pkgutil`` are:
+
+* A new ``extend_virtual_paths(path_entry)`` function, to extend
+  existing, already-imported virtual packages' ``__path__`` attributes
+  to include any portions found in a new ``sys.path`` entry.  This
+  function should be called by applications extending ``sys.path``
+  at runtime, e.g. when adding a plugin directory or an egg to the
+  path.
+
+  The implementation of this function does a simple top-down traversal
+  of ``sys.virtual_packages``, and performs any necessary
+  ``get_subpath()`` calls to identify what path entries need to
+  be added to each package's ``__path__``, given that `path_entry`
+  has been added to ``sys.path``.  (Or, in the case of sub-packages,
+  adding a derived subpath entry, based on their parent namespace's
+  ``__path__``.)
+
+* A new ``iter_virtual_packages(parent='')`` function to allow
+  top-down traversal of virtual packages in ``sys.virtual_packages``,
+  by yielding the child virtual packages of `parent`.  For example,
+  calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
+  and ``zope.products`` (if they are imported virtual packages listed
+  in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
+  (This function is needed to implement ``extend_virtual_paths()``,
+  but is also potentially useful for other code that needs to inspect
+  imported virtual packages.)
+
+* ``ImpImporter.iter_modules()`` should be changed to also detect and
+  yield the names of modules found in virtual packages.
+
+In addition to the above changes, the ``zipimport`` importer should
+have its ``iter_modules()`` implementation similarly changed.  (Note:
+current versions of Python implement this via a shim in ``pkgutil``,
+so technically this is also a change to ``pkgutil``.)
+
+Last, but not least, the ``imp`` module (or ``importlib``, if
+appropriate) should expose the algorithm described in the `__path__
+creation`_ section above, as a
+``get_virtual_path(modulename, parent_path=None)`` function, so that
+creators of ``__import__`` replacements can use it.
+
+
+Implementation Notes
+--------------------
+
+For users, developers, and distributors of virtual packages:
+
+* While virtual packages are easy to set up and use, there is still
+  a time and place for using self-contained packages.  While it's not
+  strictly necessary, adding an ``__init__`` module to your
+  self-contained packages lets users of the package (and Python
+  itself) know that *all* of the package's code will be found in
+  that single subdirectory.  In addition, it lets you define
+  ``__all__``, expose a public API, provide a package-level docstring,
+  and do other things that make more sense for a self-contained
+  project than for a mere "namespace" package.
+
+* ``sys.virtual_packages`` is allowed to contain non-existent or
+  not-yet-imported package names; code that uses its contents should
+  not assume that every name in this set is also present in
+  ``sys.modules`` or that importing the name will necessarily succeed.
+
+* If you are changing a currently self-contained package into a
+  virtual one, it's important to note that you can no longer use its
+  ``__file__`` attribute to locate data files stored in a package
+  directory.  Instead, you must search ``__path__`` or use the
+  ``__file__`` of a submodule adjacent to the desired files, or
+  of a self-contained subpackage that contains the desired files.
+
+  (Note: this caveat is already true for existing users of "namespace
+  packages" today.  That is, it is an inherent result of being able
+  to partition a package, that you must know *which* partition the
+  desired data file lives in.  We mention it here simply so that
+  *new* users converting from self-contained to virtual packages will
+  also be aware of it.)
+
+* XXX what is the __file__ of a "pure virtual" package?  ``None``?
+  Some arbitrary string?  The path of the first directory with a
+  trailing separator?  No matter what we put, *some* code is
+  going to break, but the last choice might allow some code to
+  accidentally work.  Is that good or bad?
+
+
+For those implementing PEP \302 importer objects:
+
+* Importers that support the ``iter_modules()`` method (used by
+  ``pkgutil`` to locate importable modules and packages) and want to
+  add virtual package support should modify their ``iter_modules()``
+  method so that it discovers and lists virtual packages as well as
+  standard modules and packages.  To do this, the importer should
+  simply list all immediate subdirectory names in its jurisdiction
+  that are valid Python identifiers.
+
+  XXX This might list a lot of not-really-packages.  Should we
+  require importable contents to exist?  If so, how deep do we
+  search, and how do we prevent e.g. link loops, or traversing onto
+  different filesystems, etc.?  Ick.
+
+* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
+  not need to implement ``get_subpath()``, because the method
+  is only called on importers corresponding to ``sys.path`` entries
+  and ``__path__`` entries.  If a meta importer wishes to support
+  virtual packages, it must do so entirely within its own
+  ``find_module()`` implementation.
+
+  Unfortunately, it is unlikely that any such implementation will be
+  able to merge its package subpaths with those of other meta
+  importers or ``sys.path`` importers, so the meaning of "supporting
+  virtual packages" for a meta importer is currently undefined!
+
+  (However, since the intended use case for meta importers is to
+  replace Python's normal import process entirely for some subset of
+  modules, and the number of such importers currently implemented is
+  quite small, this seems unlikely to be a big issue in practice.)
+
+
+References
+==========
+
+.. [1] "namespace" vs "module" packages (mailing list thread)
+  (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
+
+.. [2] "Dropping __init__.py requirement for subpackages"
+  (http://mail.python.org/pipermail/python-dev/2006-April/064400.html)
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+..
+  Local Variables:
+  mode: indented-text
+  indent-tabs-mode: nil
+  sentence-end-double-space: t
+  fill-column: 70
+  coding: utf-8
+  End:

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list