From ericsnowcurrently at gmail.com Thu Jul 7 06:35:06 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 Jul 2011 22:35:06 -0600 Subject: [Import-SIG] PEP 382 update? and implementation feedback Message-ID: Any feedback on PJE's proposal [1] regarding PEP 382? I have some free time to work on a reference implementation and want to make sure I am targeting an up-to-date spec. My first goal is to help get a proof-of-concept implementation out there for the PEP, for 3.3, regardless of the ultimate implementation. However, my end goal is to leverage that effort into a backported implementation for 2.x. How far back should I go with that? I was thinking 2.4 [2]. The two approaches I've considered to meet these goals are a heavy import hook and changes to importlib. For what I have in mind, both would require backporting the full importlib (for my end goal); currently only a simple port of import_module is backported and released on PyPI. The import hook approach would not be helpful for 3.3 except as a proof-of-concept. However, the importlib approach could also work as the 3.3 implementation if Brett realizes his intentions for importlib.__import__ [3]. Thoughts? -eric [1] http://mail.python.org/pipermail/import-sig/2011-June/000208.html [2] the version depends partly on use cases, like google app engine (2.5) and the various distros (no idea). I'm personally stuck on 2.4 at work for the next while, hence my choice. :) [3] http://bugs.python.org/issue2377 From eric at trueblade.com Thu Jul 7 11:39:38 2011 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 Jul 2011 05:39:38 -0400 Subject: [Import-SIG] PEP 382 update? and implementation feedback In-Reply-To: References: Message-ID: <4E157EDA.6000606@trueblade.com> On 7/7/2011 12:35 AM, Eric Snow wrote: > Any feedback on PJE's proposal [1] regarding PEP 382? I have some > free time to work on a reference implementation and want to make sure > I am targeting an up-to-date spec. I've been working on a response, but haven't had time to post it yet. Maybe in the next few days. I agree with most of it (and maybe all of it, I'm still reading through it). > My first goal is to help get a proof-of-concept implementation out > there for the PEP, for 3.3, regardless of the ultimate implementation. > However, my end goal is to leverage that effort into a backported > implementation for 2.x. How far back should I go with that? I was > thinking 2.4 [2]. We (python-dev) can't release a new version of 2.x. That said, I'd love it if I could compile a version of 2.5 for my own uses that had this feature, or if it could be done as an import hook. > The import hook approach would not be helpful for 3.3 except as a > proof-of-concept. However, the importlib approach could also work as > the 3.3 implementation if Brett realizes his intentions for > importlib.__import__ [3]. Are you thinking of doing the import hook version in C? From ericsnowcurrently at gmail.com Thu Jul 7 16:36:30 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Jul 2011 08:36:30 -0600 Subject: [Import-SIG] PEP 382 update? and implementation feedback In-Reply-To: <4E157EDA.6000606@trueblade.com> References: <4E157EDA.6000606@trueblade.com> Message-ID: On Thu, Jul 7, 2011 at 3:39 AM, Eric Smith wrote: > On 7/7/2011 12:35 AM, Eric Snow wrote: >> My first goal is to help get a proof-of-concept implementation out >> there for the PEP, for 3.3, regardless of the ultimate implementation. >> ?However, my end goal is to leverage that effort into a backported >> implementation for 2.x. ?How far back should I go with that? ?I was >> thinking 2.4 [2]. > > We (python-dev) can't release a new version of 2.x. That said, I'd love > it if I could compile a version of 2.5 for my own uses that had this > feature, or if it could be done as an import hook. > Yeah, any backport that I do would be released on PyPI, as has been done with things like importlib and distutils2. >> The import hook approach would not be helpful for 3.3 except as a >> proof-of-concept. ?However, the importlib approach could also work as >> the 3.3 implementation if Brett realizes his intentions for >> importlib.__import__ [3]. > > Are you thinking of doing the import hook version in C? Nope. I'm just planning on extending (and backporting) importlib either indirectly (for the import hook) or directly. With the import hook it should be easy enough to add it onto sys.meta_path early on. Same with explicitly changing __import__ to be importlib.__import__. If I keep the implementation pure Python it would be usable with Jython, PyPy, and the rest. Also, my Python-fu is much stronger than my C. Finally, my understanding is that performance is the only gain for a C version, which does not seem to matter much for imports (hence importlib). Keep in mind that I don't have a vested interest in PEP 382, just in import features. The PEP and ensuing discussion seem clear enough that that should not get in the way. However, if the actual use cases dictate a different approach I'd be glad to reassess. -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From pje at telecommunity.com Thu Jul 7 20:43:18 2011 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 07 Jul 2011 14:43:18 -0400 Subject: [Import-SIG] PEP 382 update? and implementation feedback In-Reply-To: <4E157EDA.6000606@trueblade.com> References: <4E157EDA.6000606@trueblade.com> Message-ID: <20110707184337.BEEF13A4108@sparrow.telecommunity.com> At 05:39 AM 7/7/2011 -0400, Eric Smith wrote: >We (python-dev) can't release a new version of 2.x. That said, I'd love >it if I could compile a version of 2.5 for my own uses that had this >feature, or if it could be done as an import hook. FYI, I have a draft import hook for 2.x that complies with the spec I proposed: http://pastebin.com/uFQ9iwXQ In fact, the spec proposal is a retrofit based on my import hook being (AFAICT) the Simplest Thing That Could Possibly Work for a 2.x implementation. That code hasn't actually been tested yet; I was starting to port the PEP 382 branch's test suite when I noticed the discrepancy between what I was doing and what the tests were looking for. That's why I ended up proposing a change, as the tests check for something that seems like another unneeded feature (i.e., the ability to sandwich undeclared namespace directories between declared ones). It probably would be a good idea to revise the PEP itself, assuming Martin is amenable. One thing I'd also like to clean up, for example, is the idea that there's a '*' in __path__ lists. If we are no longer using '*' in .pth files to denote namespaces, then the '*' in __path__ is kind of pointless. So, sys.namespace_packages should be the sole arbiter of what constitutes a namespace package. (It should also be clarified that sys.namespace_packages may name packages which are not as yet imported, although the implied semantics are undefined.) Anyway... still looking for some feedback here. I'd like to know if there's general support before taking the time to revise the tests, draft an updated spec, etc. From pje at telecommunity.com Fri Jul 8 21:51:39 2011 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 08 Jul 2011 15:51:39 -0400 Subject: [Import-SIG] New draft revision for PEP 382 Message-ID: <20110708195157.335043A404D@sparrow.telecommunity.com> The following is my attempt at an updated draft of PEP 382, based on the recently-discussed changes. To address the questions and criticisms raisd on Python-Dev when the PEP was introduced, I added an extended "Motivation" section that explains issues with the current approaches, and states the case for the PEP in more detail, including info about why anyone should care about namespace packages in the first place. ;-) I've also added a "Rejected Alternatives" section to document the other proposed approaches and the rationale for rejecting them in favor of the current proposal. In addition, I've specified in a bit more detail the necessary changes to e.g. the pkgutil module. (At least one open issue remains, however, and that is the question of what, if anything, should happen to the existing extend_path() function. A second possible open question regards the API of the path fixup functions I propose in pkgutil.) Anyway, your questions and comments, please! The draft follows below: PEP: 382 Title: Namespace Package Declarations Version: $Revision$ Last-Modified: $Date$ Author: Martin v. L??wis , PJ Eby Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Apr-2009 Python-Version: 3.2 Post-History: Abstract ======== This PEP proposes an enhancement to Python's import machinery to replace existing uses of the standard library's ``pkgutil.extend_path()`` API, and similar third-party APIs such as ``pkg_resources.declare_namespace()``. The proposed enhancement will improve the reliability of existing namespace package implementations, while providing "One Obvious Way" to produce and consume namespace packages. Terminology =========== Within this PEP, the following terms are used as follows: Package Python packages as defined by Python's import statement. Distribution A separately installable set of Python modules, as registered in the Python package index, and installed by distutils, setuptools, etc. Vendor Package A group of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages installed on Linux systems). Portion A set of files in a single directory (possibly inside a zip file or other storage mechanism) that contribute modules or subpackages to a namespace package. The contents of each portion ``sys.path`` Namespace Package A package whose subpackages and modules can be split into portions that can be distributed or installed separately (via separate distributions and/or vendor packages), in shared or separate installation locations. Unlike a regular package, however, which only allows submodule and subpackage imports from a single location, a namespace package's ``__path__`` is configured so that submodules and subpackages can be imported from each of its installed portions, regardless of their relative positions in ``sys.path``. Motivation ========== .. epigraph:: "Most packages are like modules. Their contents are highly interdependent and can't be pulled apart. [However,] some packages exist to provide a separate namespace. ... It should be possible to distribute sub-packages or submodules of these [namespace packages] independently." -- Jim Fulton, shortly before the release of Python 2.3 [1]_ The Current Approach -------------------- First introduced in Python 2.3, namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. This splitting has two main benefits: 1. It allows different parts of a large package or framework to be distributed and installed independently. For example, installing the ``zope.interface`` package without having to install every package in the ``zope.*`` namespace. (This is somewhat similar to the way Perl's package system allows authors to separately distribute subpackages of ``File::`` or ``Email::``.) 2. As a side-effect of benefit 1, it reduces package naming collisions across multiple authors or organizations, by encouraging them to use distinguishing prefixes. Instead of say, Zope and Twisted both offering a top-level ``interface`` package (in which case, both could not be installed to the same directory), they can use ``zope.interface`` and ``twisted.interface``, while still being able to distribute these subpackages separately from other ``zope`` or ``twisted`` subpackages. (This is somewhat similar to the way Java uses names like ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions, only flatter.) In current Python versions, however, a registration function (such as ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``) must be explicitly invoked in order to set up the package's ``__path__``. There are two problems with this approach, however. Problems With The Current Approach ---------------------------------- The first (and lesser) problem is that there is no One Obvious Way to either declare that a package is a "namespace" or "module" package, or to tell which kind of package a given directory on disk is. Instead, you must choose one of the various APIs to use, each of which is slightly-incompatible with the others. (For example, ``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports package portions living in zip files, and adding new path components to already-imported namespaces, whereas ``pkgutil`` doesn't.) Similarly, to tell whether a given directory is a "namespace" or "module" package, you must read its documentation or inspect its code in detail, and be able to recognize the various API calls mentioned above. The second -- and much larger -- issue is that whichever API is used to declare the namespace, the declaration has to be invoked from a namespace package's ``__init__`` module in order to work. (Otherwise, only the first part of the package found on ``sys.path`` would be importable.) This clashes with the goal of separately installing portions of a namespace, because then each distributed piece must include a copy of the same ``__init__.py``. (Otherwise, each piece would not be importable on its own, as Python currently requires the existence of an ``__init__`` module in order to import the package at all, let alone set up the namespace!) In addition to the developer inconvenience of creating, synchronizing, and distributing these duplicated ``__init__`` modules, there is a further problem created for operating system vendors. Vendor packages typically must not provide overlapping files, and an attempt to install a vendor package that has a file already on disk will fail or cause unpredictable behavior. As vendors might choose to package distributions such that they will end up all in a single directory for the namespace package, all portions would contribute conflicting ``__init__.py`` files. This issue has lead to various fragile and complex workarounds in practice, such as ``.pth`` file abuse by setuptools, and the shipping of broken partial packages with distutils. With the enhancement proposed here, however, all of the above problems can be readily resolved. Specification ============= Instead of an API call buried inside a series of duplicated and potentially-clashing ``__init__`` modules (which mostly exist only to make the package importable and declare its namespace-ness), this PEP proposes that Python's import machinery be modified to include direct support for namespace packages. This support would work by adding a new way to desginate a directory as containing a namespace package portion: by including one or more ``*.ns`` files in it. This approach removes the need for an ``__init__`` module to be duplicated across namespace package portions. Instead, each portion can simply include a uniquely-named ``*.ns`` file, thereby avoiding filename clashes in vendor packages. And, since the import machinery knows that these directories are portions of a namespace package, it can automatically initialize the package's ``__path__`` to include portions located on different parts of ``sys.path``. (Thus avoiding the need for special code to be called in the ``__init__`` module.) In addition to doing this path setup, the import machinery will also add any imported namespace packages to ``sys.namespace_packages`` (initially an empty set), so that namespace packages can be identified or iterated over. PEP \302 Extension ------------------ The existing PEP 302 protocol is to be extended to handle namespace package portion directories, by adding a new importer method, ``namespace_subpath(fullname)``. An implementation of this method will be added to all applicable importer classes distributed with Python, including those in ``pkgutil`` and ``zipimport``). (Note: any other importer wishing to support namespace packages must provide its own implementation of this method as well. If an importer does not have a ``namespace_subpath()`` method, it will be treated as if it *did* have the method, but it returned ``None`` when called.) This new method is called just before the importer's ``find_module()`` is normally invoked. If the importer determines that `fullname` is a namespace package portion under its jurisdiction, then the importer returns an importer-specific path to that namespace portion. For example, if a standard filesystem path importer for the path ``/usr/lib/site-packages`` is about to be asked to import ``zope``, and there is a ``/usr/lib/site-packages/zope`` directory containing any files ending with ``.ns``, a call to ``namespace_subpath("zope")`` on that importer should return ``"/usr/lib/site-packages/zope"``. However, if there is no such subdirectory, or it does *not* contain any files whose names end with ``.ns``, that importer would return ``None`` instead. The Python import machinery will call this method on each importer corresponding to a path entry in ``sys.path`` (for top-level imports) or in a parent package ``__path__`` (for subpackage imports). If a normal package or module is found before a namespace package, importing proceeds according to the normal PEP 302 protocol. (That is, a loader object is simply asked to load the located module or package.) However, if a namespace package portion is found (i.e., an importer's ``namespace_subpath()`` returns a string), then the normal import search stops, and a namespace package is created instead. The import machinery continues iterating over importers and calling ``namespace_subpath()`` on them, but it does **not** continue calling ``find_module()`` on them. Instead, it accumulates any strings returned by the subpath calls, in order to assemble a ``__path__`` for the package being imported. (Note that this implies that any non-namespace packages with the same name are skipped, and not included in the resulting package's ``__path__``. In other words, a namespace package's initial ``__path__`` only includes namespace portions, never non-namespace package directories.) Once this ``__path__`` has been assembled, a module is created, and its ``__path__`` attribute is set. The package's name is then added to ``sys.namespace_packages`` -- a set of package names. Finally, the ``__init__`` module code for the package (if it exists) is located and executed in the new module's namespace. Each importer that returns a ``namespace_subpath()`` for the package is asked to perform a standard ``find_module()`` for the package. Since by the normal import rules, a directory containing an ``__init__`` module is a package, this call should succeed if the namespace package portion contains an ``__init__`` module, and the importing can proceed normally from that point. There is one caveat, however. The importers currently distributed with Python expect that *they* will be the ones to initialize the ``__path__`` attribute, which means that they must be changed to either recognize that ``__path__`` has already been set and not change it, or to handle namespace packages specially (e.g., via an internal flag or checking ``sys.namespace_packages``). Similarly, any third-party importers wishing to support namespace packages must make similar changes. (NOTE: in general, it goes against the design of PEP 302 for a loader object to assume that it is always creating the module object or that the module it is operating on is empty. Making this assumption can result in code that breaks the normal operation of the ``reload()`` builtin and any specialized tools that rely on it, such as lazy importers, automatic reloaders, and so on.) Standard Library Changes/Additions ---------------------------------- The ``pkgutil`` module should be updated to handle this specification appropriately, including any necessary changes to ``extend_path()``, ``iter_modules()``, etc. A new generic API for calling ``namespace_subpath()`` on importers should be added as well. Specifically the proposed changes and additions are: * A new ``namespace_subpath(importer, fullname)`` generic, allowing implementations to be registered for existing importers. * A new ``extend_namespaces(path_entry)`` function, to extend existing and already-imported namespace packages' ``__path__`` attributes to include any portions found in a new ``sys.path`` entry. This function should be called by applications extending ``sys.path`` at runtime, e.g. to include a plugin directory or add an egg to the path. The implementation of this function does a simple breadth-first walk of ``sys.namespace_packages``, and performs any necessary ``namespace_subpath()`` calls to identify what path entries need to be added to each package's ``__path__``, given that `path_entry` has been added to ``sys.path``. * A new ``iter_namespaces(parent='')`` function to allow breadth-first traversal of namespaces in ``sys.namespace_packages``, by yielding the child namespace packages of `parent`. For example, calling ``iter_namespaces("zope")`` might yield ``zope.app`` and ``zope.products`` (if they are namespace packages registered in ``sys.namespace_packagess``), but **not** ``zope.foo.bar``. This function is needed to implement ``extend_namespaces()``, but is potentially useful to others. * ``ImpImporter.iter_modules()`` should be changed to also detect and yield the names of namespace package portions. In addition to the above changes, the ``zipimport`` importer should have its ``iter_modules()`` implementation similarly changed. (Note: current versions of Python implement this via a shim in ``pkgutil``, so technically this is also a change to ``pkgutil``.) Implementation Notes -------------------- For users, developers, and distributors of namespace packages: * ``sys.namespace_packages`` is allowed to contain non-existent or not-yet-imported package names; code that uses its contents should not assume that every name in this set is also present in sys.packages or that importing the name will necessarily succeed. * ``*.ns`` files must be empty or contain only ASCII whitespace characters. This leaves open the possibility for future extension to the format. * Files contained within a namespace package portion directory must be *unique* to that portion, so that the portion can be distributed as a vendor package without any filename overlap. This applies to modules and data files as well as ``*.ns`` files. (For ``*.ns`` files themselves, uniqueness can be achieved simply by giving them a name based on the distribution that contains the file, and it is recommended that packaging tools support doing this automatically.) * Although this PEP supports the use of non-empty ``__init__`` modules in namespace packages, their usage is controversial. If more than one package portion contains an ``__init__`` module, at most one of them will be executed, possibly leading to silent errors. Therefore, if you must include an ``__init__`` module in your namespace package, make sure that it is provided by exactly **one** distribution, and that all other distributions using that module's contents are defined so as to have an installation dependency on the distribution containing the ``__init__`` module. Otherwise, it may not be present in some installations. (Note: for historical reasons, existing namespace packages nearly always include ``__init__`` modules, but they are usually empty except for code to declare the package a namespace. Under this proposal, these nearly-empty modules could and should be replaced by an empty ``*.ns`` file in the package directory.) For those implementing PEP 302 importer objects: * Importers that support the ``iter_modules()`` method and want to add namespace support should modify their ``iter_modules()`` method so that it discovers and list namespace packages as well as standard modules and packages. * For implementation efficiency, an importer is allowed to cache information (such as whether a directory exists and whether an ``__init__`` module is present in it) between the invocation of a ``namespace_subpath()`` call and a subsequent ``find_module()`` call for the same name. It should, however, avoid retaining such cached information for any longer than the next method call, and it should also verify that the request is in fact for the same module/package name, as it is not guaranteed that a ``namespace_subpath()`` call will always be followed by a matching ``find_module()`` call. (After all, an ``__init__`` module may already have been supplied by an earlier importer on the path.) * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do not need to implement ``namespace_subpath()``, because the method is only called on importers corresponding to ``sys.path`` entries.' If a meta importer wishes to support namespace packages, it must do so entirely within its ``find_module()`` implementation. Unfortunately, it is unlikely that any such implementation will be able to merge its namespace portions with those of other meta importers or ``sys.path`` importers, so the meaning of "supporting namespace packages" for a meta importer is currently undefined. However, since the intended use case for meta importers is to replace Python's normal import process entirely for some subset of modules, and the number of such importers currently implemented is quite small, this seems unlikely to be a big issue in practice. Rejected Alternatives ===================== * The original version of this PEP used ``.pkg`` or ``.pth`` files that contained either explicit directories to be added to a package's ``__path__``, or ``*`` to indicate that a package was a namespace. But this approach required a more complex change to the importer protocol, the files had to actually be opened and read, and there were no concrete use cases proposed for the additional flexibility specifying explicit paths. * On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using extra files, namespace packages use a ``__pkg__.py`` file to indicate their namespace-ness, in addition to a (required) ``__init__.py``. Unfortunately, this approach solves only one of the `problems with the current approach`_: i.e., having a standard way of declaring and identifying namespace packages. It does not address the necessity of distributing duplicated files, or filename overlap between distributions. Further, it does not allow truly-independent namespace portions to exist, since it requires a "defining" portion (the portion containing the single ``__init__`` module) to exist. * Another approach considered during revisions to this PEP was to simply rename package directories to add a suffix like ``.ns`` or ``-ns``, to indicate their namespaced nature. This would effect a small performance improvement for the initial import of a namespace package, avoid the need to create empty ``*.ns`` files, and even make it clearer that the directory involved is a namespace portion. The downsides, however, are also plentiful. If a package starts its life as a normal package, it must be renamed when it becomes a namespace, with the implied consequences for revision control tools. Further, there is an immense body of existing code (including the distutils and many other packaging tools) that expect a package directory's name to be the same as the package name. And porting existing Python 2.x namespace packages to Python 3 would require widespread directory renaming as well. In short, this approach would require a vastly larger number of changes to both the standard library and third-party code, for a tiny potential performance improvement and a small increase in clarity. It was therefore rejected on "practicality vs. purity" grounds. References ========== .. [1] "namespace" vs "module" packages (mailing list thread) (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html) .. [2] "PEP \382: Namespace Packages" (mailing list thread) (http://mail.python.org/pipermail/python-dev/2009-April/088087.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From ericsnowcurrently at gmail.com Fri Jul 8 23:52:49 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Jul 2011 15:52:49 -0600 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby wrote: > The following is my attempt at an updated draft of PEP 382, based on the > recently-discussed changes. > > To address the questions and criticisms raisd on Python-Dev when the PEP was > introduced, I added an extended "Motivation" section that explains issues > with the current approaches, and states the case for the PEP in more detail, > including info about why anyone should care about namespace packages in the > first place. ?;-) > > I've also added a "Rejected Alternatives" section to document the other > proposed approaches and the rationale for rejecting them in favor of the > current proposal. > > In addition, I've specified in a bit more detail the necessary changes to > e.g. the pkgutil module. ?(At least one open issue remains, however, and > that is the question of what, if anything, should happen to the existing > extend_path() function. ?A second possible open question regards the API of > the path fixup functions I propose in pkgutil.) > > Anyway, your questions and comments, please! ?The draft follows below: > > > PEP: 382 > Title: Namespace Package Declarations > Version: $Revision$ > Last-Modified: $Date$ > Author: Martin v. L??wis , PJ Eby > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 02-Apr-2009 > Python-Version: 3.2 > Post-History: > > Abstract > ======== > > This PEP proposes an enhancement to Python's import machinery to > replace existing uses of the standard library's > ``pkgutil.extend_path()`` API, and similar third-party APIs such as > ``pkg_resources.declare_namespace()``. > > The proposed enhancement will improve the reliability of existing > namespace package implementations, while providing "One Obvious Way" > to produce and consume namespace packages. > > > Terminology > =========== > > Within this PEP, the following terms are used as follows: > > Package > ? ?Python packages as defined by Python's import statement. > > Distribution > ? ?A separately installable set of Python modules, as registered in > ? ?the Python package index, and installed by distutils, setuptools, > ? ?etc. > > Vendor Package > ? ?A group of files installed by an operating system's packaging > ? ?mechanism (e.g. Debian or Redhat packages installed on Linux > ? ?systems). > > Portion > ? ?A set of files in a single directory (possibly inside a zip file > ? ?or other storage mechanism) that contribute modules or subpackages > ? ?to a namespace package. ?The contents of each portion ``sys.path`` > > Namespace Package > ? ?A package whose subpackages and modules can be split into portions > ? ?that can be distributed or installed separately (via separate > ? ?distributions and/or vendor packages), in shared or separate > ? ?installation locations. > > ? ?Unlike a regular package, however, which only allows submodule > ? ?and subpackage imports from a single location, a namespace > ? ?package's ``__path__`` is configured so that submodules and > ? ?subpackages can be imported from each of its installed portions, > ? ?regardless of their relative positions in ``sys.path``. > > > Motivation > ========== > > .. epigraph:: > > ? ?"Most packages are like modules. ?Their contents are highly > ? ?interdependent and can't be pulled apart. ?[However,] some > ? ?packages exist to provide a separate namespace. ... ?It should > ? ?be possible to distribute sub-packages or submodules of these > ? ?[namespace packages] independently." > > ? ?-- Jim Fulton, shortly before the release of Python 2.3 [1]_ > This is a really helpful addition. > > The Current Approach > -------------------- > > First introduced in Python 2.3, namespace packages are a mechanism > for splitting a single Python package across multiple directories > on disk. ?This splitting has two main benefits: > > 1. It allows different parts of a large package or framework to be > ? distributed and installed independently. ?For example, installing > ? the ``zope.interface`` package without having to install every > ? package in the ``zope.*`` namespace. > > ? (This is somewhat similar to the way Perl's package system allows > ? authors to separately distribute subpackages of ``File::`` or > ? ``Email::``.) > > 2. As a side-effect of benefit 1, it reduces package naming collisions > ? across multiple authors or organizations, by encouraging them to > ? use distinguishing prefixes. ?Instead of say, Zope and Twisted both > ? offering a top-level ``interface`` package (in which case, both > ? could not be installed to the same directory), they can use > ? ``zope.interface`` and ``twisted.interface``, while still being > ? able to distribute these subpackages separately from other ``zope`` > ? or ``twisted`` subpackages. > > ? (This is somewhat similar to the way Java uses names like > ? ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions, > ? only flatter.) > > In current Python versions, however, a registration function (such as > ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``) > must be explicitly invoked in order to set up the package's > ``__path__``. > > There are two problems with this approach, however. > > > Problems With The Current Approach > ---------------------------------- > > The first (and lesser) problem is that there is no One Obvious Way to > either declare that a package is a "namespace" or "module" package, > or to tell which kind of package a given directory on disk is. > > Instead, you must choose one of the various APIs to use, each of > which is slightly-incompatible with the others. ?(For example, > ``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. ?Likewise, > setuptools supports package portions living in zip files, and adding > new path components to already-imported namespaces, whereas > ``pkgutil`` doesn't.) > > Similarly, to tell whether a given directory is a "namespace" or > "module" package, you must read its documentation or inspect its code > in detail, and be able to recognize the various API calls mentioned > above. > > The second -- and much larger -- issue is that whichever API is used > to declare the namespace, the declaration has to be invoked from a > namespace package's ``__init__`` module in order to work. ?(Otherwise, > only the first part of the package found on ``sys.path`` would be > importable.) > > This clashes with the goal of separately installing portions of a > namespace, because then each distributed piece must include a copy > of the same ``__init__.py``. ?(Otherwise, each piece would not be > importable on its own, as Python currently requires the existence > of an ``__init__`` module in order to import the package at all, let > alone set up the namespace!) > > In addition to the developer inconvenience of creating, synchronizing, > and distributing these duplicated ``__init__`` modules, there is a > further problem created for operating system vendors. > > Vendor packages typically must not provide overlapping files, and an > attempt to install a vendor package that has a file already on disk > will fail or cause unpredictable behavior. ?As vendors might choose to > package distributions such that they will end up all in a single > directory for the namespace package, all portions would contribute > conflicting ``__init__.py`` files. > > This issue has lead to various fragile and complex workarounds in > practice, such as ``.pth`` file abuse by setuptools, and the shipping > of broken partial packages with distutils. > > With the enhancement proposed here, however, all of the above problems > can be readily resolved. > > > Specification > ============= > > Instead of an API call buried inside a series of duplicated and > potentially-clashing ``__init__`` modules (which mostly exist only > to make the package importable and declare its namespace-ness), this > PEP proposes that Python's import machinery be modified to include > direct support for namespace packages. > > This support would work by adding a new way to desginate a directory > as containing a namespace package portion: by including one or more > ``*.ns`` files in it. > > This approach removes the need for an ``__init__`` module to be > duplicated across namespace package portions. ?Instead, each portion > can simply include a uniquely-named ``*.ns`` file, thereby avoiding > filename clashes in vendor packages. > > And, since the import machinery knows that these directories are > portions of a namespace package, it can automatically initialize > the package's ``__path__`` to include portions located on different > parts of ``sys.path``. ?(Thus avoiding the need for special code > to be called in the ``__init__`` module.) > > In addition to doing this path setup, the import machinery will also > add any imported namespace packages to ``sys.namespace_packages`` > (initially an empty set), so that namespace packages can be identified > or iterated over. > > > PEP \302 Extension > ------------------ > > The existing PEP 302 protocol is to be extended to handle namespace > package portion directories, by adding a new importer method, > ``namespace_subpath(fullname)``. ?An implementation of this method > will be added to all applicable importer classes distributed with > Python, including those in ``pkgutil`` and ``zipimport``). > > (Note: any other importer wishing to support namespace packages must > provide its own implementation of this method as well. ?If an importer > does not have a ``namespace_subpath()`` method, it will be treated as > if it *did* have the method, but it returned ``None`` when called.) > > This new method is called just before the importer's ``find_module()`` > is normally invoked. ?If the importer determines that `fullname` is > a namespace package portion under its jurisdiction, then the importer > returns an importer-specific path to that namespace portion. > > For example, if a standard filesystem path importer for the path > ``/usr/lib/site-packages`` is about to be asked to import ``zope``, > and there is a ``/usr/lib/site-packages/zope`` directory containing > any files ending with ``.ns``, a call to ``namespace_subpath("zope")`` > on that importer should return ``"/usr/lib/site-packages/zope"``. > > However, if there is no such subdirectory, or it does *not* contain > any files whose names end with ``.ns``, that importer would return > ``None`` instead. > > The Python import machinery will call this method on each importer > corresponding to a path entry in ``sys.path`` (for top-level imports) > or in a parent package ``__path__`` (for subpackage imports). > > If a normal package or module is found before a namespace package, > importing proceeds according to the normal PEP 302 protocol. ?(That > is, a loader object is simply asked to load the located module or > package.) > > However, if a namespace package portion is found (i.e., an importer's > ``namespace_subpath()`` returns a string), then the normal import > search stops, and a namespace package is created instead. > > The import machinery continues iterating over importers and calling > ``namespace_subpath()`` on them, but it does **not** continue calling > ``find_module()`` on them. ?Instead, it accumulates any strings > returned by the subpath calls, in order to assemble a ``__path__`` > for the package being imported. > > (Note that this implies that any non-namespace packages with the same > name are skipped, and not included in the resulting package's > ``__path__``. ?In other words, a namespace package's initial > ``__path__`` only includes namespace portions, never non-namespace > package directories.) > > Once this ``__path__`` has been assembled, a module is created, and > its ``__path__`` attribute is set. ?The package's name is then added > to ``sys.namespace_packages`` -- a set of package names. > > Finally, the ``__init__`` module code for the package (if it exists) > is located and executed in the new module's namespace. > > Each importer that returns a ``namespace_subpath()`` for the package > is asked to perform a standard ``find_module()`` for the package. > Since by the normal import rules, a directory containing an > ``__init__`` module is a package, this call should succeed if the > namespace package portion contains an ``__init__`` module, and the > importing can proceed normally from that point. > > There is one caveat, however. ?The importers currently distributed > with Python expect that *they* will be the ones to initialize the > ``__path__`` attribute, which means that they must be changed to > either recognize that ``__path__`` has already been set and not > change it, or to handle namespace packages specially (e.g., via an > internal flag or checking ``sys.namespace_packages``). > > Similarly, any third-party importers wishing to support namespace > packages must make similar changes. > > (NOTE: in general, it goes against the design of PEP 302 for a loader > object to assume that it is always creating the module object or that > the module it is operating on is empty. ?Making this assumption can > result in code that breaks the normal operation of the ``reload()`` > builtin and any specialized tools that rely on it, such as lazy > importers, automatic reloaders, and so on.) > > > Standard Library Changes/Additions > ---------------------------------- > > The ``pkgutil`` module should be updated to handle this > specification appropriately, including any necessary changes to > ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for > calling ``namespace_subpath()`` on importers should be added as well. > > Specifically the proposed changes and additions are: > > * A new ``namespace_subpath(importer, fullname)`` generic, allowing > ?implementations to be registered for existing importers. > > * A new ``extend_namespaces(path_entry)`` function, to extend existing > ?and already-imported namespace packages' ``__path__`` attributes to > ?include any portions found in a new ``sys.path`` entry. ?This > ?function should be called by applications extending ``sys.path`` > ?at runtime, e.g. to include a plugin directory or add an egg to the > ?path. > > ?The implementation of this function does a simple breadth-first walk > ?of ``sys.namespace_packages``, and performs any necessary > ?``namespace_subpath()`` calls to identify what path entries need to > ?be added to each package's ``__path__``, given that `path_entry` > ?has been added to ``sys.path``. > > * A new ``iter_namespaces(parent='')`` function to allow breadth-first > ?traversal of namespaces in ``sys.namespace_packages``, by yielding > ?the child namespace packages of `parent`. ?For example, calling > ?``iter_namespaces("zope")`` might yield ``zope.app`` and > ?``zope.products`` (if they are namespace packages registered in > ?``sys.namespace_packagess``), but **not** ``zope.foo.bar``. > ?This function is needed to implement ``extend_namespaces()``, but > ?is potentially useful to others. > > * ``ImpImporter.iter_modules()`` should be changed to also detect and > ?yield the names of namespace package portions. > > In addition to the above changes, the ``zipimport`` importer should > have its ``iter_modules()`` implementation similarly changed. ?(Note: > current versions of Python implement this via a shim in ``pkgutil``, > so technically this is also a change to ``pkgutil``.) > > > Implementation Notes > -------------------- > > For users, developers, and distributors of namespace packages: > > * ``sys.namespace_packages`` is allowed to contain non-existent or > ?not-yet-imported package names; code that uses its contents should > ?not assume that every name in this set is also present in > ?sys.packages or that importing the name will necessarily succeed. > > * ``*.ns`` files must be empty or contain only ASCII whitespace > ?characters. ?This leaves open the possibility for future extension > ?to the format. > > * Files contained within a namespace package portion directory must > ?be *unique* to that portion, so that the portion can be distributed > ?as a vendor package without any filename overlap. ?This applies to > ?modules and data files as well as ``*.ns`` files. > > ?(For ``*.ns`` files themselves, uniqueness can be achieved simply by > ?giving them a name based on the distribution that contains the file, > ?and it is recommended that packaging tools support doing this > ?automatically.) > > * Although this PEP supports the use of non-empty ``__init__`` modules > ?in namespace packages, their usage is controversial. ?If more than > ?one package portion contains an ``__init__`` module, at most one of > ?them will be executed, possibly leading to silent errors. > > ?Therefore, if you must include an ``__init__`` module in your > ?namespace package, make sure that it is provided by exactly **one** > ?distribution, and that all other distributions using that module's > ?contents are defined so as to have an installation dependency on > ?the distribution containing the ``__init__`` module. ?Otherwise, > ?it may not be present in some installations. > > ?(Note: for historical reasons, existing namespace packages nearly > ?always include ``__init__`` modules, but they are usually empty > ?except for code to declare the package a namespace. ?Under this > ?proposal, these nearly-empty modules could and should be replaced > ?by an empty ``*.ns`` file in the package directory.) > > For those implementing PEP 302 importer objects: > > * Importers that support the ``iter_modules()`` method and want to add > ?namespace support should modify their ``iter_modules()`` > ?method so that it discovers and list namespace packages as well as > ?standard modules and packages. > > * For implementation efficiency, an importer is allowed to cache > ?information (such as whether a directory exists and whether an > ?``__init__`` module is present in it) between the invocation of a > ?``namespace_subpath()`` call and a subsequent ``find_module()`` call > ?for the same name. > > ?It should, however, avoid retaining such cached information for any > ?longer than the next method call, and it should also verify that the > ?request is in fact for the same module/package name, as it is not > ?guaranteed that a ``namespace_subpath()`` call will always be > ?followed by a matching ``find_module()`` call. ?(After all, an > ?``__init__`` module may already have been supplied by an earlier > ?importer on the path.) > > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do > ?not need to implement ``namespace_subpath()``, because the method > ?is only called on importers corresponding to ``sys.path`` entries.' > ?If a meta importer wishes to support namespace packages, it must > ?do so entirely within its ``find_module()`` implementation. > > ?Unfortunately, it is unlikely that any such implementation will be > ?able to merge its namespace portions with those of other meta > ?importers or ``sys.path`` importers, so the meaning of "supporting > ?namespace packages" for a meta importer is currently undefined. > > ?However, since the intended use case for meta importers is to > ?replace Python's normal import process entirely for some subset of > ?modules, and the number of such importers currently implemented is > ?quite small, this seems unlikely to be a big issue in practice. > > > Rejected Alternatives > ===================== > > * The original version of this PEP used ``.pkg`` or ``.pth`` files > ?that contained either explicit directories to be added to a > ?package's ``__path__``, or ``*`` to indicate that a package was > ?a namespace. > > ?But this approach required a more complex change to the importer > ?protocol, the files had to actually be opened and read, and there > ?were no concrete use cases proposed for the additional flexibility > ?specifying explicit paths. > > * On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using > ?extra files, namespace packages use a ``__pkg__.py`` file to > ?indicate their namespace-ness, in addition to a (required) > ?``__init__.py``. > > ?Unfortunately, this approach solves only one of the `problems with > ?the current approach`_: i.e., having a standard way of declaring and > ?identifying namespace packages. ?It does not address the necessity > ?of distributing duplicated files, or filename overlap between > ?distributions. ?Further, it does not allow truly-independent > ?namespace portions to exist, since it requires a "defining" portion > ?(the portion containing the single ``__init__`` module) to exist. > > * Another approach considered during revisions to this PEP was to > ?simply rename package directories to add a suffix like ``.ns`` > ?or ``-ns``, to indicate their namespaced nature. ?This would effect > ?a small performance improvement for the initial import of a > ?namespace package, avoid the need to create empty ``*.ns`` files, > ?and even make it clearer that the directory involved is a namespace > ?portion. > > ?The downsides, however, are also plentiful. ?If a package starts > ?its life as a normal package, it must be renamed when it becomes > ?a namespace, with the implied consequences for revision control > ?tools. > > ?Further, there is an immense body of existing code (including the > ?distutils and many other packaging tools) that expect a package > ?directory's name to be the same as the package name. ?And porting > ?existing Python 2.x namespace packages to Python 3 would require > ?widespread directory renaming as well. > > ?In short, this approach would require a vastly larger number of > ?changes to both the standard library and third-party code, for > ?a tiny potential performance improvement and a small increase in > ?clarity. ?It was therefore rejected on "practicality vs. purity" > ?grounds. > > > > References > ========== > > .. [1] "namespace" vs "module" packages (mailing list thread) > ? (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html) > > .. [2] "PEP \382: Namespace Packages" (mailing list thread) > ? (http://mail.python.org/pipermail/python-dev/2009-April/088087.html) > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > ? Local Variables: > ? mode: indented-text > ? indent-tabs-mode: nil > ? sentence-end-double-space: t > ? fill-column: 70 > ? coding: utf-8 > ? End: > I have some separate comments on this draft that I'll have to postpone. In the meantime I have a couple of questions: 1. Should this PEP wait until importlib.__import__ replaces the builtin __import__? That will have bearing on where the implementation takes place. I'm not sure of the status of that effort, other than what Brett has reported in the tracker issue (http://bugs.python.org/issue2377), nor of the timeframe. 2. Should it wait for the work on the import engine (a GSOC project). It sounds like a PEP is in the works right now. It may also impact the implementation of this PEP. -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From barry at python.org Sat Jul 9 00:31:35 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 8 Jul 2011 18:31:35 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: <20110708183135.7c9fa5d5@limelight.wooz.org> On Jul 08, 2011, at 03:51 PM, P.J. Eby wrote: >The following is my attempt at an updated draft of PEP 382, based on the >recently-discussed changes. Thanks! I've been trying to catch up on the mailing list traffic today, and grabbed your prototype code. I plan on committing it to MvL's pep382 hg branch so we have a place to play with it. Comments inlined. >PEP: 382 >Title: Namespace Package Declarations >Version: $Revision$ >Last-Modified: $Date$ >Author: Martin v. L??wis , PJ Eby >Status: Draft >Type: Standards Track >Content-Type: text/x-rst >Created: 02-Apr-2009 >Python-Version: 3.2 >Post-History: > >Abstract >======== > >This PEP proposes an enhancement to Python's import machinery to >replace existing uses of the standard library's >``pkgutil.extend_path()`` API, and similar third-party APIs such as >``pkg_resources.declare_namespace()``. > >The proposed enhancement will improve the reliability of existing >namespace package implementations, while providing "One Obvious Way" >to produce and consume namespace packages. > > >Terminology >=========== > >Within this PEP, the following terms are used as follows: > >Package > Python packages as defined by Python's import statement. > >Distribution > A separately installable set of Python modules, as registered in > the Python package index, and installed by distutils, setuptools, > etc. > >Vendor Package > A group of files installed by an operating system's packaging > mechanism (e.g. Debian or Redhat packages installed on Linux > systems). > >Portion > A set of files in a single directory (possibly inside a zip file > or other storage mechanism) that contribute modules or subpackages > to a namespace package. The contents of each portion ``sys.path`` This one got cut off. >Namespace Package > A package whose subpackages and modules can be split into portions > that can be distributed or installed separately (via separate > distributions and/or vendor packages), in shared or separate > installation locations. > > Unlike a regular package, however, which only allows submodule > and subpackage imports from a single location, a namespace > package's ``__path__`` is configured so that submodules and > subpackages can be imported from each of its installed portions, > regardless of their relative positions in ``sys.path``. > > >Motivation >========== > >.. epigraph:: > > "Most packages are like modules. Their contents are highly > interdependent and can't be pulled apart. [However,] some > packages exist to provide a separate namespace. ... It should > be possible to distribute sub-packages or submodules of these > [namespace packages] independently." > > -- Jim Fulton, shortly before the release of Python 2.3 [1]_ Nice find! >The Current Approach >-------------------- > >First introduced in Python 2.3, namespace packages are a mechanism >for splitting a single Python package across multiple directories >on disk. This splitting has two main benefits: > >1. It allows different parts of a large package or framework to be > distributed and installed independently. For example, installing > the ``zope.interface`` package without having to install every > package in the ``zope.*`` namespace. > > (This is somewhat similar to the way Perl's package system allows > authors to separately distribute subpackages of ``File::`` or > ``Email::``.) > >2. As a side-effect of benefit 1, it reduces package naming collisions > across multiple authors or organizations, by encouraging them to > use distinguishing prefixes. Instead of say, Zope and Twisted both > offering a top-level ``interface`` package (in which case, both > could not be installed to the same directory), they can use > ``zope.interface`` and ``twisted.interface``, while still being > able to distribute these subpackages separately from other ``zope`` > or ``twisted`` subpackages. > > (This is somewhat similar to the way Java uses names like > ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions, > only flatter.) > >In current Python versions, however, a registration function (such as >``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``) >must be explicitly invoked in order to set up the package's >``__path__``. Do you need to explain a little more why __path__ is significant, and why the registration function is required? >There are two problems with this approach, however. > > >Problems With The Current Approach >---------------------------------- > >The first (and lesser) problem is that there is no One Obvious Way to >either declare that a package is a "namespace" or "module" package, >or to tell which kind of package a given directory on disk is. > >Instead, you must choose one of the various APIs to use, each of >which is slightly-incompatible with the others. (For example, >``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. Likewise, >setuptools supports package portions living in zip files, and adding >new path components to already-imported namespaces, whereas >``pkgutil`` doesn't.) > >Similarly, to tell whether a given directory is a "namespace" or >"module" package, you must read its documentation or inspect its code >in detail, and be able to recognize the various API calls mentioned >above. > >The second -- and much larger -- issue is that whichever API is used >to declare the namespace, the declaration has to be invoked from a >namespace package's ``__init__`` module in order to work. (Otherwise, >only the first part of the package found on ``sys.path`` would be >importable.) > >This clashes with the goal of separately installing portions of a >namespace, because then each distributed piece must include a copy >of the same ``__init__.py``. (Otherwise, each piece would not be >importable on its own, as Python currently requires the existence >of an ``__init__`` module in order to import the package at all, let >alone set up the namespace!) > >In addition to the developer inconvenience of creating, synchronizing, >and distributing these duplicated ``__init__`` modules, there is a >further problem created for operating system vendors. > >Vendor packages typically must not provide overlapping files, and an >attempt to install a vendor package that has a file already on disk >will fail or cause unpredictable behavior. As vendors might choose to >package distributions such that they will end up all in a single >directory for the namespace package, all portions would contribute >conflicting ``__init__.py`` files. I might word this a little differently. Perhaps: Vendor packaging standards require every file on disk to be owned by exactly one vendor package. But because each portion of a namespace package may be contained in a separate vendor package, multiple vendor packages would have to own the namespace package's __init__.py file. For example, would the ``zope.interface`` vendor package own ``zope/__init__.py`` or would the ``zope.component`` vendor package own it? Different vendors handle this conflict differently, and in fact, different packaging tools from the same vendor can handle this differently, which can cause consistency problems. >This issue has lead to various fragile and complex workarounds in >practice, such as ``.pth`` file abuse by setuptools, and the shipping >of broken partial packages with distutils. > >With the enhancement proposed here, however, all of the above problems >can be readily resolved. > > >Specification >============= > >Instead of an API call buried inside a series of duplicated and >potentially-clashing ``__init__`` modules (which mostly exist only >to make the package importable and declare its namespace-ness), this >PEP proposes that Python's import machinery be modified to include >direct support for namespace packages. > >This support would work by adding a new way to desginate a directory s/desginate/designate/ >as containing a namespace package portion: by including one or more >``*.ns`` files in it. > >This approach removes the need for an ``__init__`` module to be >duplicated across namespace package portions. Instead, each portion >can simply include a uniquely-named ``*.ns`` file, thereby avoiding >filename clashes in vendor packages. I think a concrete example would really help here. E.g.: For example, the ``zope.interface`` portion would include a ``zope/zope.interface.ns`` file, while the ``zope.component`` portion would include a ``zope/zope.component.ns`` file. The very presence of any ``.ns`` files inside the ``zope`` directory is enough to designate ``zope`` as a namespace package. No conflicting ``zope/__init__.py`` file is necessary. >And, since the import machinery knows that these directories are >portions of a namespace package, it can automatically initialize >the package's ``__path__`` to include portions located on different >parts of ``sys.path``. (Thus avoiding the need for special code >to be called in the ``__init__`` module.) > >In addition to doing this path setup, the import machinery will also >add any imported namespace packages to ``sys.namespace_packages`` >(initially an empty set), so that namespace packages can be identified >or iterated over. > > >PEP \302 Extension >------------------ > >The existing PEP 302 protocol is to be extended to handle namespace >package portion directories, by adding a new importer method, >``namespace_subpath(fullname)``. An implementation of this method >will be added to all applicable importer classes distributed with >Python, including those in ``pkgutil`` and ``zipimport``). > >(Note: any other importer wishing to support namespace packages must >provide its own implementation of this method as well. If an importer >does not have a ``namespace_subpath()`` method, it will be treated as >if it *did* have the method, but it returned ``None`` when called.) > >This new method is called just before the importer's ``find_module()`` >is normally invoked. If the importer determines that `fullname` is >a namespace package portion under its jurisdiction, then the importer >returns an importer-specific path to that namespace portion. Please define exactly what ``fullname`` is. >For example, if a standard filesystem path importer for the path >``/usr/lib/site-packages`` is about to be asked to import ``zope``, >and there is a ``/usr/lib/site-packages/zope`` directory containing >any files ending with ``.ns``, a call to ``namespace_subpath("zope")`` >on that importer should return ``"/usr/lib/site-packages/zope"``. > >However, if there is no such subdirectory, or it does *not* contain >any files whose names end with ``.ns``, that importer would return >``None`` instead. > >The Python import machinery will call this method on each importer >corresponding to a path entry in ``sys.path`` (for top-level imports) >or in a parent package ``__path__`` (for subpackage imports). > >If a normal package or module is found before a namespace package, >importing proceeds according to the normal PEP 302 protocol. (That >is, a loader object is simply asked to load the located module or >package.) > >However, if a namespace package portion is found (i.e., an importer's >``namespace_subpath()`` returns a string), then the normal import >search stops, and a namespace package is created instead. > >The import machinery continues iterating over importers and calling >``namespace_subpath()`` on them, but it does **not** continue calling >``find_module()`` on them. Instead, it accumulates any strings >returned by the subpath calls, in order to assemble a ``__path__`` >for the package being imported. > >(Note that this implies that any non-namespace packages with the same >name are skipped, and not included in the resulting package's >``__path__``. In other words, a namespace package's initial >``__path__`` only includes namespace portions, never non-namespace >package directories.) Would you expect this to be common? Did you have any examples in mind, or was it just covering-the-bases? >Once this ``__path__`` has been assembled, a module is created, and >its ``__path__`` attribute is set. The package's name is then added >to ``sys.namespace_packages`` -- a set of package names. > >Finally, the ``__init__`` module code for the package (if it exists) >is located and executed in the new module's namespace. > >Each importer that returns a ``namespace_subpath()`` for the package >is asked to perform a standard ``find_module()`` for the package. >Since by the normal import rules, a directory containing an >``__init__`` module is a package, this call should succeed if the >namespace package portion contains an ``__init__`` module, and the >importing can proceed normally from that point. > >There is one caveat, however. The importers currently distributed >with Python expect that *they* will be the ones to initialize the >``__path__`` attribute, which means that they must be changed to >either recognize that ``__path__`` has already been set and not >change it, or to handle namespace packages specially (e.g., via an >internal flag or checking ``sys.namespace_packages``). > >Similarly, any third-party importers wishing to support namespace >packages must make similar changes. > >(NOTE: in general, it goes against the design of PEP 302 for a loader >object to assume that it is always creating the module object or that >the module it is operating on is empty. Making this assumption can >result in code that breaks the normal operation of the ``reload()`` >builtin and any specialized tools that rely on it, such as lazy >importers, automatic reloaders, and so on.) > > >Standard Library Changes/Additions >---------------------------------- > >The ``pkgutil`` module should be updated to handle this >specification appropriately, including any necessary changes to >``extend_path()``, ``iter_modules()``, etc. A new generic API for >calling ``namespace_subpath()`` on importers should be added as well. Is there any reason not to put extend_path() on the road to deprecation? >Specifically the proposed changes and additions are: > >* A new ``namespace_subpath(importer, fullname)`` generic, allowing > implementations to be registered for existing importers. Is this the registration mechanism? >* A new ``extend_namespaces(path_entry)`` function, to extend existing > and already-imported namespace packages' ``__path__`` attributes to > include any portions found in a new ``sys.path`` entry. This > function should be called by applications extending ``sys.path`` > at runtime, e.g. to include a plugin directory or add an egg to the > path. > > The implementation of this function does a simple breadth-first walk > of ``sys.namespace_packages``, and performs any necessary > ``namespace_subpath()`` calls to identify what path entries need to > be added to each package's ``__path__``, given that `path_entry` > has been added to ``sys.path``. > >* A new ``iter_namespaces(parent='')`` function to allow breadth-first > traversal of namespaces in ``sys.namespace_packages``, by yielding > the child namespace packages of `parent`. For example, calling > ``iter_namespaces("zope")`` might yield ``zope.app`` and > ``zope.products`` (if they are namespace packages registered in > ``sys.namespace_packagess``), but **not** ``zope.foo.bar``. s/packagess/packages/ > This function is needed to implement ``extend_namespaces()``, but > is potentially useful to others. > >* ``ImpImporter.iter_modules()`` should be changed to also detect and > yield the names of namespace package portions. > >In addition to the above changes, the ``zipimport`` importer should >have its ``iter_modules()`` implementation similarly changed. (Note: >current versions of Python implement this via a shim in ``pkgutil``, >so technically this is also a change to ``pkgutil``.) > > >Implementation Notes >-------------------- > >For users, developers, and distributors of namespace packages: > >* ``sys.namespace_packages`` is allowed to contain non-existent or > not-yet-imported package names; code that uses its contents should > not assume that every name in this set is also present in > sys.packages or that importing the name will necessarily succeed. > >* ``*.ns`` files must be empty or contain only ASCII whitespace > characters. This leaves open the possibility for future extension > to the format. Getting back to our previous discussion on this, I might also add a comment format, e.g. lines starting with `#`. Almost any extension we can come up with will probably need to include comments, so we might as well add them here now. This will also allow folks to add copyright, or other textual information into .ns files as their coding conventions may dictate. Do you expect to ignore everything else, or throw an exception? Let's be explicit about that. >* Files contained within a namespace package portion directory must > be *unique* to that portion, so that the portion can be distributed > as a vendor package without any filename overlap. This applies to > modules and data files as well as ``*.ns`` files. > > (For ``*.ns`` files themselves, uniqueness can be achieved simply by > giving them a name based on the distribution that contains the file, > and it is recommended that packaging tools support doing this > automatically.) > >* Although this PEP supports the use of non-empty ``__init__`` modules > in namespace packages, their usage is controversial. If more than > one package portion contains an ``__init__`` module, at most one of > them will be executed, possibly leading to silent errors. > > Therefore, if you must include an ``__init__`` module in your > namespace package, make sure that it is provided by exactly **one** > distribution, and that all other distributions using that module's > contents are defined so as to have an installation dependency on > the distribution containing the ``__init__`` module. Otherwise, > it may not be present in some installations. > > (Note: for historical reasons, existing namespace packages nearly > always include ``__init__`` modules, but they are usually empty > except for code to declare the package a namespace. Under this > proposal, these nearly-empty modules could and should be replaced > by an empty ``*.ns`` file in the package directory.) I'd be a little more forceful; the PEP should strongly recommend against including namespace package __init__.py files. >For those implementing PEP 302 importer objects: > >* Importers that support the ``iter_modules()`` method and want to add > namespace support should modify their ``iter_modules()`` > method so that it discovers and list namespace packages as well as > standard modules and packages. > >* For implementation efficiency, an importer is allowed to cache > information (such as whether a directory exists and whether an > ``__init__`` module is present in it) between the invocation of a > ``namespace_subpath()`` call and a subsequent ``find_module()`` call > for the same name. > > It should, however, avoid retaining such cached information for any > longer than the next method call, and it should also verify that the > request is in fact for the same module/package name, as it is not > guaranteed that a ``namespace_subpath()`` call will always be > followed by a matching ``find_module()`` call. (After all, an > ``__init__`` module may already have been supplied by an earlier > importer on the path.) > >* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do > not need to implement ``namespace_subpath()``, because the method > is only called on importers corresponding to ``sys.path`` entries.' > If a meta importer wishes to support namespace packages, it must > do so entirely within its ``find_module()`` implementation. > > Unfortunately, it is unlikely that any such implementation will be > able to merge its namespace portions with those of other meta > importers or ``sys.path`` importers, so the meaning of "supporting > namespace packages" for a meta importer is currently undefined. > > However, since the intended use case for meta importers is to > replace Python's normal import process entirely for some subset of > modules, and the number of such importers currently implemented is > quite small, this seems unlikely to be a big issue in practice. > > >Rejected Alternatives >===================== > >* The original version of this PEP used ``.pkg`` or ``.pth`` files > that contained either explicit directories to be added to a > package's ``__path__``, or ``*`` to indicate that a package was > a namespace. > > But this approach required a more complex change to the importer > protocol, the files had to actually be opened and read, and there > were no concrete use cases proposed for the additional flexibility > specifying explicit paths. > >* On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using > extra files, namespace packages use a ``__pkg__.py`` file to > indicate their namespace-ness, in addition to a (required) > ``__init__.py``. > > Unfortunately, this approach solves only one of the `problems with > the current approach`_: i.e., having a standard way of declaring and > identifying namespace packages. It does not address the necessity > of distributing duplicated files, or filename overlap between > distributions. Further, it does not allow truly-independent > namespace portions to exist, since it requires a "defining" portion > (the portion containing the single ``__init__`` module) to exist. > >* Another approach considered during revisions to this PEP was to > simply rename package directories to add a suffix like ``.ns`` > or ``-ns``, to indicate their namespaced nature. This would effect > a small performance improvement for the initial import of a > namespace package, avoid the need to create empty ``*.ns`` files, > and even make it clearer that the directory involved is a namespace > portion. > > The downsides, however, are also plentiful. If a package starts > its life as a normal package, it must be renamed when it becomes > a namespace, with the implied consequences for revision control > tools. > > Further, there is an immense body of existing code (including the > distutils and many other packaging tools) that expect a package > directory's name to be the same as the package name. And porting > existing Python 2.x namespace packages to Python 3 would require > widespread directory renaming as well. > > In short, this approach would require a vastly larger number of > changes to both the standard library and third-party code, for > a tiny potential performance improvement and a small increase in > clarity. It was therefore rejected on "practicality vs. purity" > grounds. > > > >References >========== > >.. [1] "namespace" vs "module" packages (mailing list thread) > (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html) > >.. [2] "PEP \382: Namespace Packages" (mailing list thread) > (http://mail.python.org/pipermail/python-dev/2009-April/088087.html) > >Copyright >========= > >This document has been placed in the public domain. > > >.. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: You've done a really excellent job at both simplifying the specification, and providing a clear explanation of the issues and mechanisms involved. Kudos! I really like this a lot, and wholeheartedly support its adoption. I hope MvL will agree. I'm going to have a look at your prototype now and will commit it, and any updates, to the hg repo. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Sat Jul 9 00:33:23 2011 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 08 Jul 2011 18:33:23 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: <20110708223346.3FD4F3A404D@sparrow.telecommunity.com> At 03:52 PM 7/8/2011 -0600, Eric Snow wrote: >1. Should this PEP wait until importlib.__import__ replaces the >builtin __import__? That will have bearing on where the >implementation takes place. I'm not sure of the status of that >effort, other than what Brett has reported in the tracker issue >(http://bugs.python.org/issue2377), nor of the timeframe. > >2. Should it wait for the work on the import engine (a GSOC project). >It sounds like a PEP is in the works right now. It may also impact >the implementation of this PEP. Honestly, since I've done very little with Python 3.x and don't expect to be involved in the implementation there, I would leave answering those questions to the folks involved. I will say, though, that this really doesn't modify the main import processing loop much; it's just an extra method call at the point where you have a finder, and a few extra local variables. So I don't see any insurmountable obstacles to adding it to import.c, at least given what I remember of how the 2.x version works. But again, I'm not the one doing the work, so take that with a grain of salt. ;-) From pje at telecommunity.com Sat Jul 9 02:01:28 2011 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 08 Jul 2011 20:01:28 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110708183135.7c9fa5d5@limelight.wooz.org> References: <20110708195157.335043A404D@sparrow.telecommunity.com> <20110708183135.7c9fa5d5@limelight.wooz.org> Message-ID: <20110709000146.06C313A404D@sparrow.telecommunity.com> At 06:31 PM 7/8/2011 -0400, Barry Warsaw wrote: >On Jul 08, 2011, at 03:51 PM, P.J. Eby wrote: > > >The following is my attempt at an updated draft of PEP 382, based on the > >recently-discussed changes. > >Thanks! I've been trying to catch up on the mailing list traffic today, and >grabbed your prototype code. I plan on committing it to MvL's pep382 hg >branch so we have a place to play with it. You should probably start from this version instead: http://pastebin.com/Wv77WYyb It's got some work on other things like iter_modules, extend_namespaces, etc. > >Portion > > A set of files in a single directory (possibly inside a zip file > > or other storage mechanism) that contribute modules or subpackages > > to a namespace package. The contents of each portion ``sys.path`` > >This one got cut off. Oops. A bad edit; ignore that sentence fragment, it was replaced by language in the definition that followed it. > >Motivation > >========== > > > >.. epigraph:: > > > > "Most packages are like modules. Their contents are highly > > interdependent and can't be pulled apart. [However,] some > > packages exist to provide a separate namespace. ... It should > > be possible to distribute sub-packages or submodules of these > > [namespace packages] independently." > > > > -- Jim Fulton, shortly before the release of Python 2.3 [1]_ > >Nice find! That was where Jim coined the term in the first place. I went back looking because I remembered at least Jim, Guido and I hashing this out back then on a zope related mailing list. Took a few minutes to find, but I think it was worth it. >Do you need to explain a little more why __path__ is significant, and why the >registration function is required? Revsed paragraph: ==== In current Python versions, however, a registration function (such as ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``) must be explicitly invoked in order to set up the package's ``__path__``. (By default, a package's ``__path__`` lists only one directory, so to allow imports from more than one directory, the ``__path__`` must be explicitly extended in code.) ==== > >Vendor packages typically must not provide overlapping files, and an > >attempt to install a vendor package that has a file already on disk > >will fail or cause unpredictable behavior. As vendors might choose to > >package distributions such that they will end up all in a single > >directory for the namespace package, all portions would contribute > >conflicting ``__init__.py`` files. > >I might word this a little differently. Perhaps: > >Vendor packaging standards require every file on disk to be owned by exactly >one vendor package. But because each portion of a namespace package may be >contained in a separate vendor package, multiple vendor packages would have to >own the namespace package's __init__.py file. For example, would the >``zope.interface`` vendor package own ``zope/__init__.py`` or would the >``zope.component`` vendor package own it? Different vendors handle this >conflict differently, and in fact, different packaging tools from the same >vendor can handle this differently, which can cause consistency problems. I took the original wording as directly as practical from MvLs, but I agree yours is clearer. OTOH, I think the "fail or cause unpredictable behavior is a much stronger motivator than, "it's nonstandard and confusing". ;-) Did you have a specific rationale for your choice? I mean, what did you want to gain or avoid by the change? > >This support would work by adding a new way to desginate a directory > >s/desginate/designate/ Got it, thanks for the careful read! > >as containing a namespace package portion: by including one or more > >``*.ns`` files in it. > > > >This approach removes the need for an ``__init__`` module to be > >duplicated across namespace package portions. Instead, each portion > >can simply include a uniquely-named ``*.ns`` file, thereby avoiding > >filename clashes in vendor packages. > >I think a concrete example would really help here. E.g.: > >For example, the ``zope.interface`` portion would include a >``zope/zope.interface.ns`` file, while the ``zope.component`` portion would >include a ``zope/zope.component.ns`` file. The very presence of any ``.ns`` >files inside the ``zope`` directory is enough to designate ``zope`` as a >namespace package. No conflicting ``zope/__init__.py`` file is necessary. The problem with this example is that it gives the impression that .ns files are named for packages, instead of being named for distributions. So, I went with a more detailed and explict example. Here's my revised version: ==== For example, if two distributions, ``Importing`` and ``ProxyTypes`` wish to contribute the modules ``peak.util.imports`` and ``peak.util.proxies`` to the ``peak.util`` namespace package, then their source distribution directory layouts would look like this:: ProxyTypes-0.9.tgz: peak/ ProxyTypes.ns <- 'peak' is a namespace package util/ ProxyTypes.ns <- 'peak.util' is a namespace package proxies.py Importing-1.10.tgz: peak/ Importing.ns <- 'peak' is a namespace package util/ Importing.ns <- 'peak.util' is a namespace package imports.py If installed separately (e.g. one via system package, another via a user's home directory), then the ``__path__`` of the ``peak`` main package will include both ``peak`` subdirectories, and the ``__path__`` of the ``peak.util`` namespace package will include both ``peak/util`` subdirectories. Thus, both ``peak.util.proxies`` and ``peak.util.imports`` will be importable, despite the physical separation of the modules. On the other hand, if these portions are both installed to the *same* directory, the layout will look like this:: site-packages/ (or wherever) peak/ Importing.ns ProxyTypes.ns <- both portions' .ns files appear util/ Importing.ns <- at both levels ProxyTypes.ns imports.py proxies.py And the ``__path__`` of the ``peak`` and ``peak.util`` packages will only contain a single directory each. (Assuming these are the only contributions to ``peak`` and ``peak.util`` on ``sys.path``, of course!) Either way, the mere presence of the ``.ns`` files tells the import machinery that the directory is a namespace package portion and is importable; there is no need for any ``__init__.py`` files that would cause installation conflicts, when both portions are installed to the same target location. In addition to detecting namespace portions and adding them to the package's ``__path__``, the import machinery will also add any imported namespace packages to ``sys.namespace_packages`` (initially an empty set), so that namespace packages can be identified or iterated over. ==== I think this also gets more of the clarity about __path__ that you asked for, too. > >This new method is called just before the importer's ``find_module()`` > >is normally invoked. If the importer determines that `fullname` is > >a namespace package portion under its jurisdiction, then the importer > >returns an importer-specific path to that namespace portion. > >Please define exactly what ``fullname`` is. Ugh. Do I have to? ;-) Will it work if I just change that to "just before the importer's ``find_module(fullname)`` is normally invoked", so it's more clearly implied? > >(Note that this implies that any non-namespace packages with the same > >name are skipped, and not included in the resulting package's > >``__path__``. In other words, a namespace package's initial > >``__path__`` only includes namespace portions, never non-namespace > >package directories.) > >Would you expect this to be common? Did you have any examples in mind, or was >it just covering-the-bases? Just covering the bases. > >Standard Library Changes/Additions > >---------------------------------- > > > >The ``pkgutil`` module should be updated to handle this > >specification appropriately, including any necessary changes to > >``extend_path()``, ``iter_modules()``, etc. A new generic API for > >calling ``namespace_subpath()`` on importers should be added as well. > >Is there any reason not to put extend_path() on the road to deprecation? I don't know. Is there? As I said, I considered that an open question. > >Specifically the proposed changes and additions are: > > > >* A new ``namespace_subpath(importer, fullname)`` generic, allowing > > implementations to be registered for existing importers. > >Is this the registration mechanism? Registration for what? I meant that this is analogous to other pkgutil generic functions that let you call a PEP 302 extension protocol on an importer, whether or not the importer directly implements that protocol. For example, pkgutil.iter_importer_modules() is a generic function that lets you ask an importer to iterate over available modules, whether it actually implements its own "iter_modules()" method or not. The pkgutil.namespace_subpath() function would do the same for the (possibly-absent) namespace_subpath() method on existing importers, and allow third parties to register namespace support for custom importers that can't be directly modified to support namespace packages. Any thoughts on how better to word that bit, without necessarily going into that much detail? ;-) >s/packagess/packages/ Got it. > >* ``*.ns`` files must be empty or contain only ASCII whitespace > > characters. This leaves open the possibility for future extension > > to the format. > >Getting back to our previous discussion on this, I might also add a comment >format, e.g. lines starting with `#`. Almost any extension we can come up >with will probably need to include comments, so we might as well add them here >now. This will also allow folks to add copyright, or other textual >information into .ns files as their coding conventions may dictate. > >Do you expect to ignore everything else, or throw an exception? Let's be >explicit about that. We won't be opening the files at all, so the contents will be ignored. >I'd be a little more forceful; the PEP should strongly recommend against >including namespace package __init__.py files. As I said, it's controversial. Some people really want those __init__ modules, and setuptools sort-of supports them now. I can make it a bit more forceful, though. >You've done a really excellent job at both simplifying the specification, and >providing a clear explanation of the issues and mechanisms involved. Kudos! >I really like this a lot, and wholeheartedly support its adoption. I hope MvL >will agree. Thanks. From ncoghlan at gmail.com Sat Jul 9 10:00:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jul 2011 18:00:18 +1000 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: On Sat, Jul 9, 2011 at 7:52 AM, Eric Snow wrote: > I have some separate comments on this draft that I'll have to > postpone. ?In the meantime I have a couple of questions: > > 1. Should this PEP wait until importlib.__import__ replaces the > builtin __import__? ?That will have bearing on where the > implementation takes place. ?I'm not sure of the status of that > effort, other than what Brett has reported in the tracker issue > (http://bugs.python.org/issue2377), nor of the timeframe. Up to the people implementing it. They can either do the work twice (once for import.c and once for importlib) in the knowledge that the intent is to nuke (most of) import.c before 3.3 is released or else they can just do the importlib implementation and make issue 2377 a dependency of the PEP 382 support becoming available in the default interpreter. The only approach I would actively oppose is checking in a PEP 382 implementation that *didn't* include the necessary importlib updates. > 2. Should it wait for the work on the import engine (a GSOC project). > It sounds like a PEP is in the works right now. ?It may also impact > the implementation of this PEP. PEP 382 is much further along (and more significant from a practical point of view) than the import engine work, so it shouldn't be delayed for the latter. If PEP 382 goes in first the appropriate changes to the engine code to account for sys.namespace_packages and the importer protocol changes can be adopted from the importlib modifications. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jul 9 10:13:47 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jul 2011 18:13:47 +1000 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: Nice write up! Barry covered most things, just a few minor comments below. On Sat, Jul 9, 2011 at 5:51 AM, P.J. Eby wrote: > Vendor Package > ? ?A group of files installed by an operating system's packaging > ? ?mechanism (e.g. Debian or Redhat packages installed on Linux > ? ?systems). s/Redhat/RPM/ (or Red Hat. Either works, but Redhat is wrong) > * A new ``extend_namespaces(path_entry)`` function, to extend existing > ?and already-imported namespace packages' ``__path__`` attributes to > ?include any portions found in a new ``sys.path`` entry. ?This > ?function should be called by applications extending ``sys.path`` > ?at runtime, e.g. to include a plugin directory or add an egg to the > ?path. > > ?The implementation of this function does a simple breadth-first walk > ?of ``sys.namespace_packages``, and performs any necessary > ?``namespace_subpath()`` calls to identify what path entries need to > ?be added to each package's ``__path__``, given that `path_entry` > ?has been added to ``sys.path``. I believe this may need a "parent=''" argument so it can also be used to extend a package path. > For users, developers, and distributors of namespace packages: > > * ``sys.namespace_packages`` is allowed to contain non-existent or > ?not-yet-imported package names; code that uses its contents should > ?not assume that every name in this set is also present in > ?sys.packages or that importing the name will necessarily succeed. s/sys.packages/sys.modules/ > > * ``*.ns`` files must be empty or contain only ASCII whitespace > ?characters. ?This leaves open the possibility for future extension > ?to the format. +1 for Barry's suggestion to mandate # as a comment prefix and disallow any other contents (even though the interpreter itself won't enforce that). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Sat Jul 9 10:42:13 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 9 Jul 2011 02:42:13 -0600 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: Thanks for working on this. It's looking good. A couple of questions inline. Apologies ahead of time if my ignorance shows to loudly. :) On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby wrote: > > PEP \302 Extension > ------------------ > > The existing PEP 302 protocol is to be extended to handle namespace > package portion directories, by adding a new importer method, > ``namespace_subpath(fullname)``. ?An implementation of this method > will be added to all applicable importer classes distributed with > Python, including those in ``pkgutil`` and ``zipimport``). > > (Note: any other importer wishing to support namespace packages must > provide its own implementation of this method as well. ?If an importer > does not have a ``namespace_subpath()`` method, it will be treated as > if it *did* have the method, but it returned ``None`` when called.) > > This new method is called just before the importer's ``find_module()`` > is normally invoked. ?If the importer determines that `fullname` is > a namespace package portion under its jurisdiction, then the importer > returns an importer-specific path to that namespace portion. > > For example, if a standard filesystem path importer for the path > ``/usr/lib/site-packages`` is about to be asked to import ``zope``, > and there is a ``/usr/lib/site-packages/zope`` directory containing > any files ending with ``.ns``, a call to ``namespace_subpath("zope")`` > on that importer should return ``"/usr/lib/site-packages/zope"``. > And if there were a "zope_part1" and a "zope_part2" directory, both with a zope.ns file in them, that namespace_subpath("zope") call would return ["/usr/lib/site-packages/zope_part1", "/usr/lib/site-packages/zope_part2"], right? And if both also had a foo.ns file in them, the same would be returned for namespace_subpath("foo"). > However, if there is no such subdirectory, or it does *not* contain > any files whose names end with ``.ns``, that importer would return > ``None`` instead. > > The Python import machinery will call this method on each importer > corresponding to a path entry in ``sys.path`` (for top-level imports) > or in a parent package ``__path__`` (for subpackage imports). > > If a normal package or module is found before a namespace package, > importing proceeds according to the normal PEP 302 protocol. ?(That > is, a loader object is simply asked to load the located module or > package.) > > However, if a namespace package portion is found (i.e., an importer's > ``namespace_subpath()`` returns a string), then the normal import > search stops, and a namespace package is created instead. > > The import machinery continues iterating over importers and calling > ``namespace_subpath()`` on them, but it does **not** continue calling > ``find_module()`` on them. ?Instead, it accumulates any strings > returned by the subpath calls, in order to assemble a ``__path__`` > for the package being imported. > > (Note that this implies that any non-namespace packages with the same > name are skipped, and not included in the resulting package's > ``__path__``. ?In other words, a namespace package's initial > ``__path__`` only includes namespace portions, never non-namespace > package directories.) > > Once this ``__path__`` has been assembled, a module is created, and > its ``__path__`` attribute is set. ?The package's name is then added > to ``sys.namespace_packages`` -- a set of package names. > > Finally, the ``__init__`` module code for the package (if it exists) > is located and executed in the new module's namespace. > > Each importer that returns a ``namespace_subpath()`` for the package > is asked to perform a standard ``find_module()`` for the package. > Since by the normal import rules, a directory containing an > ``__init__`` module is a package, this call should succeed if the > namespace package portion contains an ``__init__`` module, and the > importing can proceed normally from that point. > Is this last paragraph part of the finally? If so, what does calling find_module at this point accompish? Do you mean load_module is also called for each that is found? Will it be too easy (or conversely very likely) to have __init__.py collisions? > There is one caveat, however. ?The importers currently distributed > with Python expect that *they* will be the ones to initialize the > ``__path__`` attribute, which means that they must be changed to > either recognize that ``__path__`` has already been set and not > change it, or to handle namespace packages specially (e.g., via an > internal flag or checking ``sys.namespace_packages``). > > Similarly, any third-party importers wishing to support namespace > packages must make similar changes. > Seems like the caveat is dependent on the above algorithm. If the module's __path__ were set with the namespace_subpath() results after the namespace package's import was all over, would it still be an issue? Most of this section reads like "this is how people should expect the implementation to look". However, I'm fine with that if this is how the implementation should look. :) > (NOTE: in general, it goes against the design of PEP 302 for a loader > object to assume that it is always creating the module object or that > the module it is operating on is empty. ?Making this assumption can > result in code that breaks the normal operation of the ``reload()`` > builtin and any specialized tools that rely on it, such as lazy > importers, automatic reloaders, and so on.) > > > Standard Library Changes/Additions > ---------------------------------- > > The ``pkgutil`` module should be updated to handle this > specification appropriately, including any necessary changes to > ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for > calling ``namespace_subpath()`` on importers should be added as well. > > Specifically the proposed changes and additions are: Maybe, "Specifically the proposed changes and additions to pkgutil are:", to clarify the context? > > * A new ``namespace_subpath(importer, fullname)`` generic, allowing > ?implementations to be registered for existing importers. > > * A new ``extend_namespaces(path_entry)`` function, to extend existing > ?and already-imported namespace packages' ``__path__`` attributes to > ?include any portions found in a new ``sys.path`` entry. ?This > ?function should be called by applications extending ``sys.path`` > ?at runtime, e.g. to include a plugin directory or add an egg to the > ?path. > > ?The implementation of this function does a simple breadth-first walk > ?of ``sys.namespace_packages``, and performs any necessary > ?``namespace_subpath()`` calls to identify what path entries need to > ?be added to each package's ``__path__``, given that `path_entry` > ?has been added to ``sys.path``. > Does the same apply to namespace sub-packages where their parent package has an updated __path__? So a recursion would take place in some cases. > * A new ``iter_namespaces(parent='')`` function to allow breadth-first > ?traversal of namespaces in ``sys.namespace_packages``, by yielding > ?the child namespace packages of `parent`. ?For example, calling > ?``iter_namespaces("zope")`` might yield ``zope.app`` and > ?``zope.products`` (if they are namespace packages registered in > ?``sys.namespace_packagess``), but **not** ``zope.foo.bar``. > ?This function is needed to implement ``extend_namespaces()``, but > ?is potentially useful to others. > > * ``ImpImporter.iter_modules()`` should be changed to also detect and > ?yield the names of namespace package portions. > > In addition to the above changes, the ``zipimport`` importer should > have its ``iter_modules()`` implementation similarly changed. ?(Note: > current versions of Python implement this via a shim in ``pkgutil``, > so technically this is also a change to ``pkgutil``.) > > > Implementation Notes > -------------------- > > For users, developers, and distributors of namespace packages: > > * ``sys.namespace_packages`` is allowed to contain non-existent or > ?not-yet-imported package names; code that uses its contents should > ?not assume that every name in this set is also present in > ?sys.packages or that importing the name will necessarily succeed. > > * ``*.ns`` files must be empty or contain only ASCII whitespace > ?characters. ?This leaves open the possibility for future extension > ?to the format. > > * Files contained within a namespace package portion directory must > ?be *unique* to that portion, so that the portion can be distributed > ?as a vendor package without any filename overlap. ?This applies to > ?modules and data files as well as ``*.ns`` files. > > ?(For ``*.ns`` files themselves, uniqueness can be achieved simply by > ?giving them a name based on the distribution that contains the file, > ?and it is recommended that packaging tools support doing this > ?automatically.) > > * Although this PEP supports the use of non-empty ``__init__`` modules > ?in namespace packages, their usage is controversial. ?If more than > ?one package portion contains an ``__init__`` module, at most one of > ?them will be executed, possibly leading to silent errors. > As noted above, the implementation outlined in the "PEP \302 Extension" section seems ambiguous on this point. However, I think this bullet does a great job clarifying about __init__.py modules. > ?Therefore, if you must include an ``__init__`` module in your > ?namespace package, make sure that it is provided by exactly **one** > ?distribution, and that all other distributions using that module's > ?contents are defined so as to have an installation dependency on > ?the distribution containing the ``__init__`` module. ?Otherwise, > ?it may not be present in some installations. > > ?(Note: for historical reasons, existing namespace packages nearly > ?always include ``__init__`` modules, but they are usually empty > ?except for code to declare the package a namespace. ?Under this > ?proposal, these nearly-empty modules could and should be replaced > ?by an empty ``*.ns`` file in the package directory.) > > For those implementing PEP 302 importer objects: > > * Importers that support the ``iter_modules()`` method and want to add > ?namespace support should modify their ``iter_modules()`` > ?method so that it discovers and list namespace packages as well as > ?standard modules and packages. > The iter_modules() method isn't part of PEP 302, is it? Where can I find out more about it? > * For implementation efficiency, an importer is allowed to cache > ?information (such as whether a directory exists and whether an > ?``__init__`` module is present in it) between the invocation of a > ?``namespace_subpath()`` call and a subsequent ``find_module()`` call > ?for the same name. > > ?It should, however, avoid retaining such cached information for any > ?longer than the next method call, and it should also verify that the > ?request is in fact for the same module/package name, as it is not > ?guaranteed that a ``namespace_subpath()`` call will always be > ?followed by a matching ``find_module()`` call. ?(After all, an > ?``__init__`` module may already have been supplied by an earlier > ?importer on the path.) > > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do > ?not need to implement ``namespace_subpath()``, because the method > ?is only called on importers corresponding to ``sys.path`` entries.' And parent.__path__ for namespace submodules? > ?If a meta importer wishes to support namespace packages, it must > ?do so entirely within its ``find_module()`` implementation. > > ?Unfortunately, it is unlikely that any such implementation will be > ?able to merge its namespace portions with those of other meta > ?importers or ``sys.path`` importers, so the meaning of "supporting > ?namespace packages" for a meta importer is currently undefined. > While I'm not sure meta importers need to be left out, I suppose it isn't critical since the work-around isn't that hard, nor widely needed. Thus the message here is that this PEP only applies to the use of sys.path_hooks and sys.path_importer_cache. It would be nice for that to be clear up front. >... > > ?Further, there is an immense body of existing code (including the > ?distutils and many other packaging tools) that expect a package > ?directory's name to be the same as the package name. Correct me if I'm wrong, but I have understood that for namespace packages in the PEP, the directory name does not have to be the package name. Back to namespace subpackages, it's unclear how they should work. Either a namespace package is at the top level of a sys.path entry, or its a module of a parent package, namespace or otherwise. The top-level case is pretty clear. However, the subpackage case is not. I don't see namespace subpackages as being too practical with non-namespace parent packages, but I'm probably missing something. In the case that a namespace subpackage has a namespace package parent, how would that look? In the email to Barry you gave an example that covered this a little, but it's still pretty unclear. In any case, I think more examples with namespace subpackages would be helpful. Or maybe namespace subpackages are a corner case that doesn't deserve the keystrokes I've given it. ;) Other than that, the PEP is pretty clear (coming from a less experienced perspective). Thanks again for working on this! -eric From ncoghlan at gmail.com Sat Jul 9 11:07:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jul 2011 19:07:14 +1000 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: On Sat, Jul 9, 2011 at 6:42 PM, Eric Snow wrote: > Correct me if I'm wrong, but I have understood that for namespace > packages in the PEP, the directory name does not have to be the > package name. No, if the directory name doesn't match, the interpreter won't even check it for __init__.py or .ns files, so there's no way for it to satisfy an import request (remember, the .ns files don't live directly in directories on sys.path - they live in *subdirectories* of those directories) > Back to namespace subpackages, it's unclear how they should work. > Either a namespace package is at the top level of a sys.path entry, or > its a module of a parent package, namespace or otherwise. ?The > top-level case is pretty clear. ?However, the subpackage case is not. > I don't see namespace subpackages as being too practical with > non-namespace parent packages, but I'm probably missing something. > > In the case that a namespace subpackage has a namespace package > parent, how would that look? ?In the email to Barry you gave an > example that covered this a little, but it's still pretty unclear. ?In > any case, I think more examples with namespace subpackages would be > helpful. ?Or maybe namespace subpackages are a corner case that > doesn't deserve the keystrokes I've given it. ?;) The subpackage import is just a scaled down version of top-level imports, with pkg.__path__ taking on the role of sys.path. Now, in normal circumstances, it's a pretty degenerate case with pkg.__path__ containing only a single directory (the directory where the __init__.py file lives). namespace packages (either created via PEP 382 or one of the existing namespace package systems) are one way to get multiple entries into pkg.__path__, but not the only way (__init__.py can do it, as can other application code). Regardless of how it happens, the process of handling it under PEP 382 is the same as it is for top-level imports - once the import machinery sees a .ns file in a directory that matches the current import inside that package, it then scans the rest of the pkg.__path__ entries looking for more directories that also contain .ns files, adding all those directories to pkg.subpkg.__path__ Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Sat Jul 9 20:52:49 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 Jul 2011 14:52:49 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: <20110709185310.280C33A404D@sparrow.telecommunity.com> At 06:13 PM 7/9/2011 +1000, Nick Coghlan wrote: >Nice write up! Thanks. >Barry covered most things, just a few minor comments below. > >On Sat, Jul 9, 2011 at 5:51 AM, P.J. Eby wrote: > > Vendor Package > > A group of files installed by an operating system's packaging > > mechanism (e.g. Debian or Redhat packages installed on Linux > > systems). > >s/Redhat/RPM/ > >(or Red Hat. Either works, but Redhat is wrong) Done. > > * A new ``extend_namespaces(path_entry)`` function, to extend existing > > and already-imported namespace packages' ``__path__`` attributes to > > include any portions found in a new ``sys.path`` entry. This > > function should be called by applications extending ``sys.path`` > > at runtime, e.g. to include a plugin directory or add an egg to the > > path. > >I believe this may need a "parent=''" argument so it can also be used >to extend a package path. Yes, it does; see the sketch here: http://pastebin.com/G7fdFG2V I just left that bit out of the spec as an extra detail that would need explaining; I see it as really being internal to the API being provided in that case. > > For users, developers, and distributors of namespace packages: > > > > * ``sys.namespace_packages`` is allowed to contain non-existent or > > not-yet-imported package names; code that uses its contents should > > not assume that every name in this set is also present in > > sys.packages or that importing the name will necessarily succeed. > >s/sys.packages/sys.modules/ Got it. > > * ``*.ns`` files must be empty or contain only ASCII whitespace > > characters. This leaves open the possibility for future extension > > to the format. > >+1 for Barry's suggestion to mandate # as a comment prefix and >disallow any other contents (even though the interpreter itself won't >enforce that). Yeah... Honestly the more we talk about all that the more inclined I am to saying that it's a zero-length file, just to avoid more spec detail. ;-) I just don't know if there are any issues with packaging or revision control systems for zero length files, not to mention whether there are OSes where it's hard to make a zero-length file. From pje at telecommunity.com Sat Jul 9 23:20:30 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 Jul 2011 17:20:30 -0400 Subject: [Import-SIG] Is ".ns" really the right extension? Message-ID: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Looking over the example code I added to the PEP draft (based on Barry's suggestion), it occurs to me that, like his example, mine is still confusing. And, now that I look more closely at it, I see that the confusion in large part comes from the idea of naming something "ThisPart.ns" -- it implies that "ThisPart" is the namespace! And it's not a namespace at all. It's really a portion of the namespace. It seems to me that what the actual meaning of a foo.ns file is, "The 'foo' portion of the this namespace is installed here". And that thus, foo.portion or foo.part or foo.contribution something like that would be more appropriate, given the PEP terminology. I think that a change is needed here to make the PEP's narrative come together more cleanly. I'm leaning towards calling them foo.contrib files, as in "The 'foo' distribution contributed to this portion of the enclosing package." (Among other things, this makes the need for repeated files clearer; i.e., you add a contribution marker to each package directory you're putting files or subdirectories into.) Overall, the narrative can then lose the constant references to *.ns files and instead talk about contribution markers -- i.e. a namespace package portion is a directory containing one or more contribution markers. I think this will be clearer than the current text, and in particular it should make the example directory layout more meaningful to read. Notice, too, that Eric Snow's confusion about how .ns files work seems to have been influenced by the terminology -- i.e., the expectation that a 'zope.ns' file was talking about a 'zope' namespace package and identifying the containing directory as part of the namespace, rather than the other way around. Was that the case, Eric? And if so, do you think that these layouts are any clearer? ProxyTypes-0.9.tgz: peak/ ProxyTypes.contrib <- marks this as a namespace package portion util/ ProxyTypes.contrib <- same for 'peak.util' proxies.py Importing-1.10.tgz: peak/ Importing.contrib <- marks this as a namespace package portion util/ Importing.contrib <- same for 'peak.util' imports.py site-packages/ (or wherever) peak/ Importing.contrib ProxyTypes.contrib <- both distributions' contributions are merged util/ Importing.contrib <- at both levels ProxyTypes.contrib imports.py proxies.py Any other thoughts? From pje at telecommunity.com Sat Jul 9 23:49:49 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 Jul 2011 17:49:49 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: References: <20110708195157.335043A404D@sparrow.telecommunity.com> Message-ID: <20110709215010.43C7D3A404D@sparrow.telecommunity.com> At 02:42 AM 7/9/2011 -0600, Eric Snow wrote: >And if there were a "zope_part1" and a "zope_part2" directory, both >with a zope.ns file in them, that namespace_subpath("zope") call would >return ["/usr/lib/site-packages/zope_part1", >"/usr/lib/site-packages/zope_part2"], right? And if both also had a >foo.ns file in them, the same would be returned for >namespace_subpath("foo"). No; the directory is always named for the package, just like now. We're just saying that we replace looking for __init__.py with looking for *.ns. > > Finally, the ``__init__`` module code for the package (if it exists) > > is located and executed in the new module's namespace. > > > > Each importer that returns a ``namespace_subpath()`` for the package > > is asked to perform a standard ``find_module()`` for the package. > > Since by the normal import rules, a directory containing an > > ``__init__`` module is a package, this call should succeed if the > > namespace package portion contains an ``__init__`` module, and the > > importing can proceed normally from that point. > > > >Is this last paragraph part of the finally? Yes. > If so, what does calling >find_module at this point accompish? Do you mean load_module is also >called for each that is found? I'm adding this sentence to the end of that paragraph for clarification: """(That is, with a ``load_module()`` call to execute the first ``__init__`` module found on the package's ``__path__``.)""" Does that make it clearer? >Will it be too easy (or conversely >very likely) to have __init__.py collisions? __init__ collisions (and the recommendation to not use __init__ modules at all) are addressed later below in the implementation notes. > > There is one caveat, however. The importers currently distributed > > with Python expect that *they* will be the ones to initialize the > > ``__path__`` attribute, which means that they must be changed to > > either recognize that ``__path__`` has already been set and not > > change it, or to handle namespace packages specially (e.g., via an > > internal flag or checking ``sys.namespace_packages``). > > > > Similarly, any third-party importers wishing to support namespace > > packages must make similar changes. > > > >Seems like the caveat is dependent on the above algorithm. If the >module's __path__ were set with the namespace_subpath() results after >the namespace package's import was all over, would it still be an >issue? No, but then we couldn't support __init__ modules executing with the correct __path__ value; notably, this would prevent __init__ modules from manipulating their own __path__. Honestly, throwing out __init__ support entirely would make a LOT of things easier and simpler here, especially in the 2.x version. But there was a vocal contingent of support for them in the original Python-Dev discussion. > > Specifically the proposed changes and additions are: > >Maybe, "Specifically the proposed changes and additions to pkgutil >are:", to clarify the context? Ok. > > > > * A new ``namespace_subpath(importer, fullname)`` generic, allowing > > implementations to be registered for existing importers. > > > > * A new ``extend_namespaces(path_entry)`` function, to extend existing > > and already-imported namespace packages' ``__path__`` attributes to > > include any portions found in a new ``sys.path`` entry. This > > function should be called by applications extending ``sys.path`` > > at runtime, e.g. to include a plugin directory or add an egg to the > > path. > > > > The implementation of this function does a simple breadth-first walk > > of ``sys.namespace_packages``, and performs any necessary > > ``namespace_subpath()`` calls to identify what path entries need to > > be added to each package's ``__path__``, given that `path_entry` > > has been added to ``sys.path``. > > > >Does the same apply to namespace sub-packages where their parent >package has an updated __path__? So a recursion would take place in >some cases. Yes, that's what "breadth-first" meant here; i.e., first top-level namespaces, then second-level namespaces, and so on. In actuality, I erred by saying breadth-first, though, what I actually meant is technically "pre-order traversal", i.e., parent nodes are touched before their children. I'll tweak that to "top-down traversal" instead of "breadth-first walk", and add: """(Or, in the case of sub-packages, adding a derived subpath entry, based on their parent namespace's ``__path__``.)""" >The iter_modules() method isn't part of PEP 302, is it? Where can I >find out more about it? See pkgutil; it's something I added in Python 2.5 to help tools like pydoc better support zipfiles and other exotic importers. > > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do > > not need to implement ``namespace_subpath()``, because the method > > is only called on importers corresponding to ``sys.path`` entries.' > >And parent.__path__ for namespace submodules? Yes. Fixed. > > If a meta importer wishes to support namespace packages, it must > > do so entirely within its ``find_module()`` implementation. > > > > Unfortunately, it is unlikely that any such implementation will be > > able to merge its namespace portions with those of other meta > > importers or ``sys.path`` importers, so the meaning of "supporting > > namespace packages" for a meta importer is currently undefined. > > > >While I'm not sure meta importers need to be left out, I suppose it >isn't critical since the work-around isn't that hard, nor widely >needed. Thus the message here is that this PEP only applies to the >use of sys.path_hooks and sys.path_importer_cache. It would be nice >for that to be clear up front. Ok, I added this: """(Note: the import machinery will NOT invoke this method for importers on ``sys.meta_path``, because there is no path string associated with such importers, and so the idea of a "subpath" makes no sense in that case.)""" just after this bit: """The Python import machinery will call this method on each importer corresponding to a path entry in ``sys.path`` (for top-level imports) or in a parent package ``__path__`` (for subpackage imports).""" in the PEP 302 protocol description. > > Further, there is an immense body of existing code (including the > > distutils and many other packaging tools) that expect a package > > directory's name to be the same as the package name. > >Correct me if I'm wrong, but I have understood that for namespace >packages in the PEP, the directory name does not have to be the >package name. Consider yourself corrected. ;-) >Back to namespace subpackages, it's unclear how they should work. >Either a namespace package is at the top level of a sys.path entry, or >its a module of a parent package, namespace or otherwise. The >top-level case is pretty clear. However, the subpackage case is not. >I don't see namespace subpackages as being too practical with >non-namespace parent packages, but I'm probably missing something. They aren't practical at all, no. ;-) I'll add an implementation note explaining that even though the spec doesn't require a namespace package's parent to also be a namespace, that there isn't any practical use in doing so, as the child __path__ is a collection of subpaths derived from the parent __path__, and thus it wouldn't combine with any other contributions that weren't installed to the same location. Here's the text: * In general, a namespace subpackage (e.g. ``peak.util``, ``zope.app``, etc.) must be a child of another namespace package (e.g. ``peak``, ``zope``, etc.). This is not required by the spec or enforced by the implementation, but in practice, it is useless to put a namespace package inside a non-namespace package, as the child package's ``__path__`` will be a subset of the parent's. In other words, it will only work correctly if all the contributions to that namespace package are installed to the same physical location. So, if you intend to use a namespace subpackage, you should always make its parent package a namespace as well. From ericsnowcurrently at gmail.com Sun Jul 10 00:30:28 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 9 Jul 2011 16:30:28 -0600 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110709215010.43C7D3A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> <20110709215010.43C7D3A404D@sparrow.telecommunity.com> Message-ID: On Sat, Jul 9, 2011 at 3:49 PM, P.J. Eby wrote: > At 02:42 AM 7/9/2011 -0600, Eric Snow wrote: >> >> And if there were a "zope_part1" and a "zope_part2" directory, both >> with a zope.ns file in them, that namespace_subpath("zope") call would >> return ["/usr/lib/site-packages/zope_part1", >> "/usr/lib/site-packages/zope_part2"], right? ?And if both also had a >> foo.ns file in them, the same would be returned for >> namespace_subpath("foo"). > > No; the directory is always named for the package, just like now. ?We're > just saying that we replace looking for __init__.py with looking for *.ns. > > >> > Finally, the ``__init__`` module code for the package (if it exists) >> > is located and executed in the new module's namespace. >> > >> > Each importer that returns a ``namespace_subpath()`` for the package >> > is asked to perform a standard ``find_module()`` for the package. >> > Since by the normal import rules, a directory containing an >> > ``__init__`` module is a package, this call should succeed if the >> > namespace package portion contains an ``__init__`` module, and the >> > importing can proceed normally from that point. >> > >> >> Is this last paragraph part of the finally? > > Yes. > >> ?If so, what does calling >> find_module at this point accompish? ?Do you mean load_module is also >> called for each that is found? > > I'm adding this sentence to the end of that paragraph for clarification: > > """(That is, with a ``load_module()`` call to execute the first ``__init__`` > module found on the package's ``__path__``.)""" > > Does that make it clearer? > Yeah, that's great. > >> Will it be too easy (or conversely >> very likely) to have __init__.py collisions? > > __init__ collisions (and the recommendation to not use __init__ modules at > all) are addressed later below in the implementation notes. > > > >> > There is one caveat, however. ?The importers currently distributed >> > with Python expect that *they* will be the ones to initialize the >> > ``__path__`` attribute, which means that they must be changed to >> > either recognize that ``__path__`` has already been set and not >> > change it, or to handle namespace packages specially (e.g., via an >> > internal flag or checking ``sys.namespace_packages``). >> > >> > Similarly, any third-party importers wishing to support namespace >> > packages must make similar changes. >> > >> >> Seems like the caveat is dependent on the above algorithm. ?If the >> module's __path__ were set with the namespace_subpath() results after >> the namespace package's import was all over, would it still be an >> issue? > > No, but then we couldn't support __init__ modules executing with the correct > __path__ value; notably, this would prevent __init__ modules from > manipulating their own __path__. > Good point. > Honestly, throwing out __init__ support entirely would make a LOT of things > easier and simpler here, especially in the 2.x version. ?But there was a > vocal contingent of support for them in the original Python-Dev discussion. > > >> > Specifically the proposed changes and additions are: >> >> Maybe, "Specifically the proposed changes and additions to pkgutil >> are:", to clarify the context? > > Ok. > >> > >> > * A new ``namespace_subpath(importer, fullname)`` generic, allowing >> > ?implementations to be registered for existing importers. >> > >> > * A new ``extend_namespaces(path_entry)`` function, to extend existing >> > ?and already-imported namespace packages' ``__path__`` attributes to >> > ?include any portions found in a new ``sys.path`` entry. ?This >> > ?function should be called by applications extending ``sys.path`` >> > ?at runtime, e.g. to include a plugin directory or add an egg to the >> > ?path. >> > >> > ?The implementation of this function does a simple breadth-first walk >> > ?of ``sys.namespace_packages``, and performs any necessary >> > ?``namespace_subpath()`` calls to identify what path entries need to >> > ?be added to each package's ``__path__``, given that `path_entry` >> > ?has been added to ``sys.path``. >> > >> >> Does the same apply to namespace sub-packages where their parent >> package has an updated __path__? ?So a recursion would take place in >> some cases. > > Yes, that's what "breadth-first" meant here; i.e., first top-level > namespaces, then second-level namespaces, and so on. ?In actuality, I erred > by saying breadth-first, though, what I actually meant is technically > "pre-order traversal", i.e., parent nodes are touched before their children. > ?I'll tweak that to "top-down traversal" instead of "breadth-first walk", > and add: > > """(Or, in the case of sub-packages, adding a derived subpath entry, based > on their parent namespace's ``__path__``.)""" > That helps a lot. > >> The iter_modules() method isn't part of PEP 302, is it? ?Where can I >> find out more about it? > > See pkgutil; it's something I added in Python 2.5 to help tools like pydoc > better support zipfiles and other exotic importers. > Yeah, I see iter_modules() in pkgutil, but was unaware of it on importer objects. However, my ignorance is irrelevant to the PEP, as I certainly agree with the bullet in the case that importer objects have iter_modules(). :) > >> > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do >> > ?not need to implement ``namespace_subpath()``, because the method >> > ?is only called on importers corresponding to ``sys.path`` entries.' >> >> And parent.__path__ for namespace submodules? > > Yes. ?Fixed. > > >> > ?If a meta importer wishes to support namespace packages, it must >> > ?do so entirely within its ``find_module()`` implementation. >> > >> > ?Unfortunately, it is unlikely that any such implementation will be >> > ?able to merge its namespace portions with those of other meta >> > ?importers or ``sys.path`` importers, so the meaning of "supporting >> > ?namespace packages" for a meta importer is currently undefined. >> > >> >> While I'm not sure meta importers need to be left out, I suppose it >> isn't critical since the work-around isn't that hard, nor widely >> needed. ?Thus the message here is that this PEP only applies to the >> use of sys.path_hooks and sys.path_importer_cache. ?It would be nice >> for that to be clear up front. > > Ok, I added this: > > """(Note: the import machinery will NOT invoke this method for importers > on ``sys.meta_path``, because there is no path string associated with > such importers, and so the idea of a "subpath" makes no sense in that > case.)""" > > just after this bit: > > """The Python import machinery will call this method on each importer > corresponding to a path entry in ``sys.path`` (for top-level imports) > or in a parent package ``__path__`` (for subpackage imports).""" > > in the PEP 302 protocol description. > Nice! Would it be worth pointing out that the focus is on sys.pathhooks and sys.path_importer_cache? Something like "Note: ... Instead, this PEP is focused on the import machinery surrounding sys.pathhooks.)" I only bring this up because the specificity of what the focus **is** helped me grasp what the implementation involves. > > >> > ?Further, there is an immense body of existing code (including the >> > ?distutils and many other packaging tools) that expect a package >> > ?directory's name to be the same as the package name. >> >> Correct me if I'm wrong, but I have understood that for namespace >> packages in the PEP, the directory name does not have to be the >> package name. > > Consider yourself corrected. ?;-) > > >> Back to namespace subpackages, it's unclear how they should work. >> Either a namespace package is at the top level of a sys.path entry, or >> its a module of a parent package, namespace or otherwise. ?The >> top-level case is pretty clear. ?However, the subpackage case is not. >> I don't see namespace subpackages as being too practical with >> non-namespace parent packages, but I'm probably missing something. > > They aren't practical at all, no. ?;-) ?I'll add an implementation note > explaining that even though the spec doesn't require a namespace package's > parent to also be a namespace, that there isn't any practical use in doing > so, as the child __path__ is a collection of subpaths derived from the > parent __path__, and thus it wouldn't combine with any other contributions > that weren't installed to the same location. > > Here's the text: > > * In general, a namespace subpackage (e.g. ``peak.util``, ``zope.app``, > ?etc.) must be a child of another namespace package (e.g. ``peak``, > ?``zope``, etc.). ?This is not required by the spec or enforced by > ?the implementation, but in practice, it is useless to put a > ?namespace package inside a non-namespace package, as the child > ?package's ``__path__`` will be a subset of the parent's. > > ?In other words, it will only work correctly if all the contributions > ?to that namespace package are installed to the same physical > ?location. ?So, if you intend to use a namespace subpackage, you > ?should always make its parent package a namespace as well. > > > Sounds great. Much appreciated. -eric From ericsnowcurrently at gmail.com Sun Jul 10 00:58:14 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 9 Jul 2011 16:58:14 -0600 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Message-ID: On Sat, Jul 9, 2011 at 3:20 PM, P.J. Eby wrote: > Looking over the example code I added to the PEP draft (based on Barry's > suggestion), it occurs to me that, like his example, mine is still > confusing. > > And, now that I look more closely at it, I see that the confusion in large > part comes from the idea of naming something "ThisPart.ns" -- it implies > that "ThisPart" is the namespace! > > And it's not a namespace at all. ?It's really a portion of the namespace. > > It seems to me that what the actual meaning of a foo.ns file is, "The 'foo' > portion of the this namespace is installed here". ?And that thus, > foo.portion or foo.part or foo.contribution something like that would be > more appropriate, given the PEP terminology. > > I think that a change is needed here to make the PEP's narrative come > together more cleanly. ?I'm leaning towards calling them foo.contrib files, > as in "The 'foo' distribution contributed to this portion of the enclosing > package." > > (Among other things, this makes the need for repeated files clearer; i.e., > you add a contribution marker to each package directory you're putting files > or subdirectories into.) > > Overall, the narrative can then lose the constant references to *.ns files > and instead talk about contribution markers -- i.e. a namespace package > portion is a directory containing one or more contribution markers. ?I think > this will be clearer than the current text, and in particular it should make > the example directory layout more meaningful to read. > > Notice, too, that Eric Snow's confusion about how .ns files work seems to > have been influenced by the terminology -- i.e., the expectation that a > 'zope.ns' file was talking about a 'zope' namespace package and identifying > the containing directory as part of the namespace, rather than the other way > around. ?Was that the case, Eric? ?And if so, do you think that these > layouts are any clearer? > Yeah, that is spot on. I definitely had it backwards. Those examples make _much_ more sense now, particularly because of the different extension, and partly from your and Nick's explanations. > > ? ?ProxyTypes-0.9.tgz: > ? ? ? ?peak/ > ? ? ? ? ? ?ProxyTypes.contrib <- marks this as a namespace package portion > ? ? ? ? ? ?util/ > ? ? ? ? ? ? ? ?ProxyTypes.contrib ? <- same for 'peak.util' > ? ? ? ? ? ? ? ?proxies.py > > ? ?Importing-1.10.tgz: > ? ? ? ?peak/ > ? ? ? ? ? ?Importing.contrib ? <- marks this as a namespace package portion > ? ? ? ? ? ?util/ > ? ? ? ? ? ? ? ?Importing.contrib ? <- same for 'peak.util' > ? ? ? ? ? ? ? ?imports.py > > > ? ?site-packages/ ? (or wherever) > ? ? ? ?peak/ > ? ? ? ? ? ?Importing.contrib > ? ? ? ? ? ?ProxyTypes.contrib ? <- both distributions' contributions are > merged > ? ? ? ? ? ?util/ > ? ? ? ? ? ? ? ?Importing.contrib ? <- at both levels > ? ? ? ? ? ? ? ?ProxyTypes.contrib > ? ? ? ? ? ? ? ?imports.py > ? ? ? ? ? ? ? ?proxies.py > > Any other thoughts? > If two contributions are added into the same directory (a la that last example) is there a way of telling programatically what portions came from which contribution? Also, if two contributions are made to a namespace package on the same sys.path entry, they must go into the same directory, right? Is there a way around that, like using zip files or something (might we find all three above examples in site-packages)? The idea of having them in separate plain directories (without __init__.py) for the same sys.path entry is part of what motivated my earlier confusion. Finally, say a portion is "contributed" to an existing non-namespace package [directory], turning it into a namespace package. The package is then impacted by PEP 382 (particularly regarding __init__.py) when it may not have been developed for use as a namespace package. Is this case worth considering? -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From pje at telecommunity.com Sun Jul 10 05:50:37 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 Jul 2011 23:50:37 -0400 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Message-ID: <20110710035101.3BD893A404D@sparrow.telecommunity.com> At 04:58 PM 7/9/2011 -0600, Eric Snow wrote: >If two contributions are added into the same directory (a la that last >example) is there a way of telling programatically what portions came >from which contribution? See PEP 376, which addresses that issue. >Also, if two contributions are made to a namespace package on the same >sys.path entry, they must go into the same directory, right? Yes. > Is there >a way around that, like using zip files or something (might we find >all three above examples in site-packages)? The idea of having them >in separate plain directories (without __init__.py) for the same >sys.path entry is part of what motivated my earlier confusion. Where did you get that idea from? Was there a particular part of the PEP I should change to avoid creating that idea, or did you have it before you read the new draft? >Finally, say a portion is "contributed" to an existing non-namespace >package [directory], turning it into a namespace package. The package >is then impacted by PEP 382 (particularly regarding __init__.py) when >it may not have been developed for use as a namespace package. Is >this case worth considering? The same thing would happen now if you installed two distributions containing files for the same package. So no, I don't think it's worth elaborating on. The PEP is starting to get kind of long as it is; I'm already a little worried about backlash when this goes back to Python-Dev, actually, *despite* the fact that it's more precisely specified, simpler, etc. than the previous shorter version. :-( From ericsnowcurrently at gmail.com Sun Jul 10 06:11:57 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 9 Jul 2011 22:11:57 -0600 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: <20110710035101.3BD893A404D@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <20110710035101.3BD893A404D@sparrow.telecommunity.com> Message-ID: On Sat, Jul 9, 2011 at 9:50 PM, P.J. Eby wrote: > At 04:58 PM 7/9/2011 -0600, Eric Snow wrote: >> >> If two contributions are added into the same directory (a la that last >> example) is there a way of telling programatically what portions came >> from which contribution? > > See PEP 376, which addresses that issue. > > >> Also, if two contributions are made to a namespace package on the same >> sys.path entry, they must go into the same directory, right? > > Yes. > > >> ?Is there >> a way around that, like using zip files or something (might we find >> all three above examples in site-packages)? ?The idea of having them >> in separate plain directories (without __init__.py) for the same >> sys.path entry is part of what motivated my earlier confusion. > > Where did you get that idea from? ?Was there a particular part of the PEP I > should change to avoid creating that idea, or did you have it before you > read the new draft? > I wish I could pin crazy things like that on someone else, but I'm afraid it's my own. Not having used namespace packages before I was trying to piece together the concept from bits and pieces when Barry brought up their sprint last month. It took this long to get through to me that I was a little backwards. :) > >> Finally, say a portion is "contributed" to an existing non-namespace >> package [directory], turning it into a namespace package. ?The package >> is then impacted by PEP 382 (particularly regarding __init__.py) when >> it may not have been developed for use as a namespace package. ?Is >> this case worth considering? > > The same thing would happen now if you installed two distributions > containing files for the same package. ?So no, I don't think it's worth > elaborating on. ?The PEP is starting to get kind of long as it is; I'm > already a little worried about backlash when this goes back to Python-Dev, > actually, *despite* the fact that it's more precisely specified, simpler, > etc. than the previous shorter version. ? :-( > > Yeah, I agree. For what it's worth, I think the PEP is a lot clearer now. -eric From eric at trueblade.com Sun Jul 10 18:33:00 2011 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 10 Jul 2011 12:33:00 -0400 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Message-ID: <4E19D43C.2080902@trueblade.com> On 7/9/2011 5:20 PM, P.J. Eby wrote: ... > And it's not a namespace at all. It's really a portion of the namespace. Agreed. > I think that a change is needed here to make the PEP's narrative come > together more cleanly. I'm leaning towards calling them foo.contrib > files, as in "The 'foo' distribution contributed to this portion of the > enclosing package." I would paint this particular bikeshed "foo.nspart", since it's a part of a namespace. "foo.contrib" sounds like a license to me. Although since the PEP uses "portion" to describe what these are, I guess I could live with "foo.portion" as well. I do have some thoughts on the other emails in this thread, but no time today to write them up. Eric. From martin at v.loewis.de Sun Jul 10 19:27:29 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 10 Jul 2011 19:27:29 +0200 Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really the right extension?) In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Message-ID: <4E19E101.4020508@v.loewis.de> I've been talking to people about how things should be named in PEP 382. I think "namespace package" is the wrong name for the feature: every package is a namespace, as is every class, object, and function. In setuptools, there might have been a point in calling it "namespace package" to indicate it is a *mere* namespace (i.e. can't contain code on its own); this won't be the case for the PEP 382 feature. Likewise, people had objections to the .ns extension: - as Phillip points out, people may confuse the file with actually being a namespace - the .ns extension does not indicate that it belongs to Python, which apparently is important to people (who otherwise don't know what piece of software is in charge of that file); this is also a flaw in Phillip's proposed '.contrib' file - the extension asks to invoke Godwin's law So here is my proposal: - the feature defined in PEP 382 is called "partial package", indicating that the entire package may be more than that. "package portion" could work as well, as could "component package" or "package component"; "partial package has the advantage of raising associations with C#'s "partial classes" which are esstentially the same feature (but on a class level). - the extension is ".pyp", for "Python Package" What do you think? Regards, Martin From eric at trueblade.com Sun Jul 10 19:59:03 2011 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 10 Jul 2011 13:59:03 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <4E19E101.4020508@v.loewis.de> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> Message-ID: <4E19E867.4020703@trueblade.com> On 07/10/2011 01:27 PM, "Martin v. L?wis" wrote: > I've been talking to people about how things should be named in PEP 382. > > I think "namespace package" is the wrong name for the feature: every > package is a namespace, as is every class, object, and function. In > setuptools, there might have been a point in calling it "namespace > package" to indicate it is a *mere* namespace (i.e. can't contain > code on its own); this won't be the case for the PEP 382 feature. I agree "namespace package" is not a great name. I don't see any particular problem with changing to a different name, though. > Likewise, people had objections to the .ns extension: > - as Phillip points out, people may confuse the file with actually > being a namespace > - the .ns extension does not indicate that it belongs to Python, > which apparently is important to people (who otherwise don't > know what piece of software is in charge of that file); this > is also a flaw in Phillip's proposed '.contrib' file > - the extension asks to invoke Godwin's law In my head I was thinking .pyns, but I thought people might pronounce the "y" as a long "e". I hadn't thought of Godwin's law with .ns, but I see your point. > So here is my proposal: > > - the feature defined in PEP 382 is called "partial package", > indicating that the entire package may be more than that. > "package portion" could work as well, as could "component > package" or "package component"; "partial package has the > advantage of raising associations with C#'s "partial classes" > which are esstentially the same feature (but on a class level). > - the extension is ".pyp", for "Python Package" > > What do you think? Partial package works for me. I too like the association with partial classes. ".pyp" is okay, although I'd avoid saying it stands for "Python Package", since the presence of the file is not what makes this code a package, it makes it a partial package. Eric. From barry at python.org Sun Jul 10 22:32:10 2011 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Jul 2011 16:32:10 -0400 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> Message-ID: <20110710163210.03f62155@resist> On Jul 09, 2011, at 05:20 PM, P.J. Eby wrote: >It seems to me that what the actual meaning of a foo.ns file is, "The 'foo' >portion of the this namespace is installed here". And that thus, foo.portion >or foo.part or foo.contribution something like that would be more >appropriate, given the PEP terminology. +1 and your rewritten example makes a lot of sense. I don't particularly like .contrib, and I saw in a followup that someone proposed .pyp. I'd be fine with that, but if you want something more descriptive (i.e. longer), then .portion works for me. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Sun Jul 10 22:34:21 2011 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Jul 2011 16:34:21 -0400 Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really the right extension?) In-Reply-To: <4E19E101.4020508@v.loewis.de> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> Message-ID: <20110710163421.3b086452@resist> On Jul 10, 2011, at 07:27 PM, Martin v. L?wis wrote: >- the feature defined in PEP 382 is called "partial package", > indicating that the entire package may be more than that. > "package portion" could work as well, as could "component > package" or "package component"; "partial package has the > advantage of raising associations with C#'s "partial classes" > which are esstentially the same feature (but on a class level). >- the extension is ".pyp", for "Python Package" I like "package portions" and .pyp. "partial packages" would be okay, but to me it puts the emphasis in the wrong place (i.e in what's missing rather than what's present). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Sun Jul 10 22:36:39 2011 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Jul 2011 16:36:39 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <4E19E867.4020703@trueblade.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> Message-ID: <20110710163639.5c3804c5@resist> On Jul 10, 2011, at 01:59 PM, Eric V. Smith wrote: >Partial package works for me. I too like the association with partial >classes. ".pyp" is okay, although I'd avoid saying it stands for "Python >Package", since the presence of the file is not what makes this code a >package, it makes it a partial package. I'd say it stands for "Python portion" which I guess isn't as descriptive as "Python package portion", but it's close enough. And at least according to http://en.wikipedia.org/wiki/List_of_file_formats_(alphabetical) .pyp is unused. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Sun Jul 10 23:01:36 2011 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Jul 2011 17:01:36 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110709000146.06C313A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> <20110708183135.7c9fa5d5@limelight.wooz.org> <20110709000146.06C313A404D@sparrow.telecommunity.com> Message-ID: <20110710170136.62976255@resist> On Jul 08, 2011, at 08:01 PM, P.J. Eby wrote: >At 06:31 PM 7/8/2011 -0400, Barry Warsaw wrote: >>Thanks! I've been trying to catch up on the mailing list traffic today, and >>grabbed your prototype code. I plan on committing it to MvL's pep382 hg >>branch so we have a place to play with it. > >You should probably start from this version instead: > > http://pastebin.com/Wv77WYyb > >It's got some work on other things like iter_modules, extend_namespaces, etc. Are you working from a publicly available repo? If not, would you like to be? . It will make collaboration easier, and MvL's hg branch is already available and I think entirely appropriate for this code, at least until its merged back into trunk. >> >Portion >> > A set of files in a single directory (possibly inside a zip file >> > or other storage mechanism) that contribute modules or subpackages >> > to a namespace package. The contents of each portion ``sys.path`` >> >>This one got cut off. > >Oops. A bad edit; ignore that sentence fragment, it was replaced by language >in the definition that followed it. Cool. >>Do you need to explain a little more why __path__ is significant, and why the >>registration function is required? > >Revsed paragraph: > >==== >In current Python versions, however, a registration function (such as >``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``) >must be explicitly invoked in order to set up the package's >``__path__``. (By default, a package's ``__path__`` lists only one >directory, so to allow imports from more than one directory, the >``__path__`` must be explicitly extended in code.) >==== I'd only add something like "Python searches a package's __path__ instead of sys.path when it's looking for subpackages. Yes, I know this is covered in PEP 302, but I think it couldn't hurt a little extra text here. Your call though. >> >Vendor packages typically must not provide overlapping files, and an >> >attempt to install a vendor package that has a file already on disk >> >will fail or cause unpredictable behavior. As vendors might choose to >> >package distributions such that they will end up all in a single >> >directory for the namespace package, all portions would contribute >> >conflicting ``__init__.py`` files. >> >>I might word this a little differently. Perhaps: >> >>Vendor packaging standards require every file on disk to be owned by exactly >>one vendor package. But because each portion of a namespace package may be >>contained in a separate vendor package, multiple vendor packages would have to >>own the namespace package's __init__.py file. For example, would the >>``zope.interface`` vendor package own ``zope/__init__.py`` or would the >>``zope.component`` vendor package own it? Different vendors handle this >>conflict differently, and in fact, different packaging tools from the same >>vendor can handle this differently, which can cause consistency problems. > >I took the original wording as directly as practical from MvLs, but I agree >yours is clearer. OTOH, I think the "fail or cause unpredictable behavior is >a much stronger motivator than, "it's nonstandard and confusing". ;-) > >Did you have a specific rationale for your choice? I mean, what did you want >to gain or avoid by the change? The confusion I had was on "overlapping files", since that doesn't have a clear meaning to me. I'm happy to use the stronger language you prefer; maybe you can work both texts into something even better! >The problem with this example is that it gives the impression that .ns files >are named for packages, instead of being named for distributions. So, I went >with a more detailed and explict example. The example you posted in the other thread looks great. >> >This new method is called just before the importer's ``find_module()`` >> >is normally invoked. If the importer determines that `fullname` is >> >a namespace package portion under its jurisdiction, then the importer >> >returns an importer-specific path to that namespace portion. >> >>Please define exactly what ``fullname`` is. > >Ugh. Do I have to? ;-) > >Will it work if I just change that to "just before the importer's >``find_module(fullname)`` is normally invoked", so it's more clearly implied? Sure, that'll work. >> >Standard Library Changes/Additions >> >---------------------------------- >> > >> >The ``pkgutil`` module should be updated to handle this >> >specification appropriately, including any necessary changes to >> >``extend_path()``, ``iter_modules()``, etc. A new generic API for >> >calling ``namespace_subpath()`` on importers should be added as well. >> >>Is there any reason not to put extend_path() on the road to deprecation? > >I don't know. Is there? As I said, I considered that an open question. I think we should. >> >Specifically the proposed changes and additions are: >> > >> >* A new ``namespace_subpath(importer, fullname)`` generic, allowing >> > implementations to be registered for existing importers. >> >>Is this the registration mechanism? > >Registration for what? I meant that this is analogous to other pkgutil >generic functions that let you call a PEP 302 extension protocol on an >importer, whether or not the importer directly implements that protocol. For >example, pkgutil.iter_importer_modules() is a generic function that lets you >ask an importer to iterate over available modules, whether it actually >implements its own "iter_modules()" method or not. The >pkgutil.namespace_subpath() function would do the same for the >(possibly-absent) namespace_subpath() method on existing importers, and allow >third parties to register namespace support for custom importers that can't >be directly modified to support namespace packages. > >Any thoughts on how better to word that bit, without necessarily going into >that much detail? ;-) I guess part of the problem is that generics like iter_importer_modules() isn't actually documented in pkgutils, no (in Python 2.7), even included in the __all__. So you can't just say something like: * A new ``namespace_subpath(importer, fullname)`` generic, analogous to other existing generics in the pkgutil package. or maybe you can when you file a bug to get the existing ones documented. . >We won't be opening the files at all, so the contents will be ignored. Does your rewrite make that explicit? I'd like to either have a strong recommendation for the file being empty, or specify the syntax we'd likely support in any extension to PEP 382. In all likelihood that would be to ignore lines with only whitespace, or that begin with a `#`. >>I'd be a little more forceful; the PEP should strongly recommend against >>including namespace package __init__.py files. > >As I said, it's controversial. Some people really want those __init__ >modules, and setuptools sort-of supports them now. I can make it a bit more >forceful, though. I think you're rewrite looked good here. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From brett at python.org Sun Jul 10 23:04:21 2011 From: brett at python.org (Brett Cannon) Date: Sun, 10 Jul 2011 14:04:21 -0700 Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really the right extension?) In-Reply-To: <20110710163421.3b086452@resist> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <20110710163421.3b086452@resist> Message-ID: On Sun, Jul 10, 2011 at 13:34, Barry Warsaw wrote: > On Jul 10, 2011, at 07:27 PM, Martin v. L?wis wrote: > > >- the feature defined in PEP 382 is called "partial package", > > indicating that the entire package may be more than that. > > "package portion" could work as well, as could "component > > package" or "package component"; "partial package has the > > advantage of raising associations with C#'s "partial classes" > > which are esstentially the same feature (but on a class level). > >- the extension is ".pyp", for "Python Package" > > I like "package portions" and .pyp. "partial packages" would be okay, but > to > me it puts the emphasis in the wrong place (i.e in what's missing rather > than > what's present). I agree with Barry on this one. "Partial package" makes me think that something is explicitly missing from the package and that it may not work until all the partial bits of the package are gathered. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sun Jul 10 23:05:00 2011 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Jul 2011 17:05:00 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110709185310.280C33A404D@sparrow.telecommunity.com> References: <20110708195157.335043A404D@sparrow.telecommunity.com> <20110709185310.280C33A404D@sparrow.telecommunity.com> Message-ID: <20110710170500.35d5b20f@resist> On Jul 09, 2011, at 02:52 PM, P.J. Eby wrote: >Yeah... Honestly the more we talk about all that the more inclined I am to >saying that it's a zero-length file, just to avoid more spec detail. ;-) > >I just don't know if there are any issues with packaging or revision control >systems for zero length files, not to mention whether there are OSes where >it's hard to make a zero-length file. Well, put it in the PEP and we'll find out! :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From eric at trueblade.com Sun Jul 10 23:14:04 2011 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 10 Jul 2011 17:14:04 -0400 Subject: [Import-SIG] New draft revision for PEP 382 In-Reply-To: <20110710170500.35d5b20f@resist> References: <20110708195157.335043A404D@sparrow.telecommunity.com> <20110709185310.280C33A404D@sparrow.telecommunity.com> <20110710170500.35d5b20f@resist> Message-ID: <4E1A161C.8090001@trueblade.com> On 7/10/2011 5:05 PM, Barry Warsaw wrote: > On Jul 09, 2011, at 02:52 PM, P.J. Eby wrote: > >> Yeah... Honestly the more we talk about all that the more inclined I am to >> saying that it's a zero-length file, just to avoid more spec detail. ;-) >> >> I just don't know if there are any issues with packaging or revision control >> systems for zero length files, not to mention whether there are OSes where >> it's hard to make a zero-length file. > > Well, put it in the PEP and we'll find out! :) I definitely think it should be a zero length file. It's the one thing we can check having already done a stat call, and without opening the file. Eric. From martin at v.loewis.de Sun Jul 10 23:55:22 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 10 Jul 2011 23:55:22 +0200 Subject: [Import-SIG] Is ".ns" really the right extension? In-Reply-To: <20110710163210.03f62155@resist> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <20110710163210.03f62155@resist> Message-ID: <4E1A1FCA.1010906@v.loewis.de> > I don't particularly like .contrib, and I saw in a followup that someone > proposed .pyp. I'd be fine with that, but if you want something more > descriptive (i.e. longer), then .portion works for me. I'd rather avoid something longer. It's a long-time convention at least on Windows to use no more than three characters for file extensions. Regards, Martin From martin at v.loewis.de Sun Jul 10 23:57:46 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 10 Jul 2011 23:57:46 +0200 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <4E19E867.4020703@trueblade.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> Message-ID: <4E1A205A.3010004@v.loewis.de> > Partial package works for me. I too like the association with partial > classes. ".pyp" is okay, although I'd avoid saying it stands for "Python > Package", since the presence of the file is not what makes this code a > package, it makes it a partial package. No - it is actually what makes it a package. There are two ways to declare a package: either put an __init__.py into the directory, or a .pyp file. Regards, Martin From pje at telecommunity.com Mon Jul 11 00:30:27 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 10 Jul 2011 18:30:27 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <4E1A205A.3010004@v.loewis.de> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> Message-ID: <20110710223044.E77943A4100@sparrow.telecommunity.com> At 11:57 PM 7/10/2011 +0200, Martin v. L?wis wrote: > > Partial package works for me. I too like the association with partial > > classes. ".pyp" is okay, although I'd avoid saying it stands for "Python > > Package", since the presence of the file is not what makes this code a > > package, it makes it a partial package. > >No - it is actually what makes it a package. There are two ways to >declare a package: either put an __init__.py into the directory, or >a .pyp file. It's too bad that (for backward compatibility reasons) we can't just use the presence of any importable file to signify this, as is the norm for Java, Perl, PHP, etc. (AFAIK, all of them have namespacey packages by default.) In any case, I agree with Barry and Brett that "partial packages" conveys the wrong impression, as it puts emphasis on what is missing rather than what is there. I want to suggest alternatives such as "compilation package" or some such to indicate that the package is a compilation of contributions, but that sounds like it's going to be compiled to assembly code or something. ;-) Frankly, though, I have no strong motivation to change the name; I'd honestly rather drop __init__ support as it's technically difficult and an invitation to problems anyway. ;-) I'm okay with some bikeshedding on the file extension, but unless somebody really comes up with a truly *excellent* replacement for "namespace package", I don't see much point to changing it. I will go ahead and throw in a few ideas, none of which I think are necessarily *excellent*, but which seem like they might work: * multipart packages (packages that can be divided into separately installed/distributed parts) * package families (a group of packages that share a "family name") * organization packages (package whose purpose is to organize other packages, and/or indicate organizational authorship) * partitioned packages (packages that can be divided into separately installed/distributed parts) Thoughts? (Oh, btw, I'm a long-time Windows user and I see zero technical or cultural problems with having a longer-than-three extension. It's increasingly common to see apps using them; even Microsoft now has '.docx' files in Office. So, for the first and last naming schemes above I would lean towards ".pypart" as the extension.) From ncoghlan at gmail.com Mon Jul 11 02:39:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jul 2011 10:39:04 +1000 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110710223044.E77943A4100@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> Message-ID: On Mon, Jul 11, 2011 at 8:30 AM, P.J. Eby wrote: > I'm okay with some bikeshedding on the file extension, but unless somebody > really comes up with a truly *excellent* replacement for "namespace > package", I don't see much point to changing it. > > I will go ahead and throw in a few ideas, none of which I think are > necessarily *excellent*, but which seem like they might work: > > ?* multipart packages (packages that can be divided into separately > installed/distributed parts) > ?* package families (a group of packages that share a "family name") > ?* organization packages (package whose purpose is to organize other > packages, and/or indicate organizational authorship) > ?* partitioned packages (packages that can be divided into separately > installed/distributed parts) > > Thoughts? FWIW, +1 on "partitioned packages" as the term and either .pyp or .pypart as the extension. Why do I like partitioned packages? 1. It correctly emphasises the real purpose of this kind of package: allowing a single namespace at the Python level to be cleanly split into multiple partitions at the file distribution level. "namespace packages" fails on this count. 2. It makes it clear that any given *piece* of the package can only be correctly provided by one partition, as anything else results in a collision within the package namespace. This is the only suggested term that really conveys this aspect at all. 3. It doesn't have the same connotations of incompleteness that plagues "partial packages" 4. It makes it clear that this is still just one package at the Python level, which a term like "package families" would obscure. 5. It is agnostic as to the reasons *why* developers might want to partition the namespace, whereas something like "organization packages" assumes a great deal about how they will be used in practice. Regards, Nick. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Mon Jul 11 04:34:45 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 10 Jul 2011 22:34:45 -0400 Subject: [Import-SIG] What if namespace imports weren't special? Message-ID: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> I think one reason we're having trouble with naming and explaining this whole concept is that, really, the current Python import system is broken, compared to other languages. In at least Perl, PHP, and Java, you don't have to do anything special to merge components in a single namespace from multiple parts of the class/include/autoload path. We are thus having trouble trying to come up with a special name to describe these, when from a more objective perspective, what we are describing are "normal packages", with what Python has now being "restricted to a single directory packages". It's for this reason that all packages being namespaces doesn't bother me for the term. All packages *should* be namespace packages, pretty much. It's the *non* namespaceyness of Python's default packages that's broken, not the term. ;-) If there really was a time machine, I like to think we'd go back and get Python's package import mechanism to just work this way from the outset (i.e. always combining shards across sys.path), and perhaps use the presence of .py[cod]/.so files as an indication of package-ness -- if indeed an indication is needed at all. Actually... here's an interesting idea. Suppose that we define the rules so that any directory containing any file with an importable extension is a namespace package... *but*, if one of those directories contains an __init__ module, that directory will be placed first on the package __path__. See, the reason why dropping the need for __init__ was previously rejected was because it meant you could block the importing of a package later on the path. *But*, if we always put the segment with __init__ first on the __path__, then any such blocking directories would not block the "real" package -- they'd just be accessible for imports. If we did that, then there would be no need for any special flag files, and no need for special terminology. The protocol in my draft would remain basically the same, except for moving the __init__ module's subpath to the front of __path__. And instead of globbing for *.pypart or whatever, importers would just check whether there was a directory there at all. The only backward compatibility that this can break is that you can import things you couldn't import before. So, if you had a foo/bar.py, with 'foo' in a sys.path directory, and you also had a 'foo' package, AND you relied on 'import foo.bar' raising an error, then it would no longer do so. But, if you *had* a foo.bar module before, then under this scheme, 'import foo.bar' would still import the exact same file it did before, so nothing changes. In other words, the first subdirectory with an __init__ gets to head up the new package's __path__, but ALL matching subdirectories will make up the tail. The big advantage of this approach is that it doesn't require us to have a special name - it's just, "Enhanced Package Imports" or some such. No special marker files to name, either. Just, "hey, people want to put their package contents in more than one directory, and they don't always need an __init__.py." Thoughts, anyone? From pje at telecommunity.com Mon Jul 11 05:10:09 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 10 Jul 2011 23:10:09 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: <20110711031033.55D1F3A4100@sparrow.telecommunity.com> At 10:34 PM 7/10/2011 -0400, P.J. Eby wrote: >The big advantage of this approach is that it doesn't require us to >have a special name - it's just, "Enhanced Package Imports" or some >such. No special marker files to name, either. Just, "hey, people >want to put their package contents in more than one directory, and >they don't always need an __init__.py." > >Thoughts, anyone? A quick follow-up; I found a thread where something vaguely similar was discussed before: http://mail.python.org/pipermail/python-dev/2006-April/064400.html Various issues regarding tool support were brought up, mainly that existing tools would not detect such packages as packages, and that doing this at the top level was problematic because of the possibility of blocking a module like 'string' or 'time' or some such. However, as it happens, with a slight adjustment to what I proposed, that latter issue can be addressed... if *any* loadable module anywhere on sys.path (vs. just a directory with an __init__) simply gets all the subpaths appended to its __path__, then having a "time" directory just gets it added to time.__path__ -- and the plain old __time__ module still gets loaded. Tool support isn't actually as much affected by my revised approach either, since if you don't intend a directory to be a package, you're not importing it. If you have a directory and your tool *doesn't* recognize it as a package, well, that's an issue of the tool adding support for namespace packages. Likewise, if you have a module or package that's working today, all that happens is that it grows a __path__ and has sub-imports possible. It does seem that the previous discussion was rather controversial, even though only sub-packages were being discussed. OTOH, the change really *was* a change, and my proposal doesn't change the existing behavior (apart from some occasional __path__ attributes appearing where they didn't before), it only adds to it. From ncoghlan at gmail.com Mon Jul 11 05:16:51 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jul 2011 13:16:51 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: On Mon, Jul 11, 2011 at 12:34 PM, P.J. Eby wrote: > The big advantage of this approach is that it doesn't require us to have a > special name - it's just, "Enhanced Package Imports" or some such. ?No > special marker files to name, either. ?Just, "hey, people want to put their > package contents in more than one directory, and they don't always need an > __init__.py." It does mean that the pkgutil changes to handle sys.path extensions will need to scan sys.modules looking for packages (i.e. modules with __path__ attributes) rather than the more limited subset that would have been stored in sys.partitioned_packages (although not adding extra global state is actually a win in my book). Removing the need for __init__.py as a package marker would also eliminate quite a lot of newbie confusion when it comes to using packages. However, I think the explicitly partitioned package approach is going to be an easier sell, as it's *obvious* that it won't break existing code. While examples of existing code that will break under a "partitioned by default" model are going to be hypothetical and contrived, they're also pretty easy to come up with. There's also a performance impact on app startup time - currently most package imports stop as soon as they hit a matching directory. Under a "partitioned by default" scheme, all package imports (including things like "logging" and "email" which currently get a hit in the first zip file for the standard library) would have to scan the entirety of sys.path just in case there are additional shards lying around. For large applications, that additional overhead is going to add up. So I don't think implicit partitioning is really going to fly at this point. That said, I wouldn't oppose tweaking the partitioned package design to eventually support dropping the requirement for explicit ".pyp(art)" files (i.e. by always placing the directory with __init__.py at the head of the list). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From waterbug at pangalactic.us Mon Jul 11 05:07:34 2011 From: waterbug at pangalactic.us (Stephen Waterbury) Date: Sun, 10 Jul 2011 23:07:34 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: <4E1A68F6.9070004@pangalactic.us> On 07/10/2011 10:34 PM, P.J. Eby wrote: > I think one reason we're having trouble with naming and explaining this > whole concept is that, really, the current Python import system is > broken, compared to other languages. That seems an important consideration, at least because of the negative perception it presents to programmers coming from other languages ... > ... All packages *should* be namespace packages, pretty > much. It's the *non* namespaceyness of Python's default packages that's > broken, not the term. ;-) Novel to a Python programmer, perhaps even "revolutionary", but logical on the face of it. > Actually... here's an interesting idea. Suppose that we define the rules > so that any directory containing any file with an importable extension > is a namespace package... *but*, if one of those directories contains an > __init__ module, that directory will be placed first on the package > __path__. > > If we did that, then there would be no need for any special flag files ... I like losing the flag files -- nice! > The only backward compatibility that this can break is that you can > import things you couldn't import before. ... ... which seems like no breakage at all, really. > In other words, the first subdirectory with an __init__ gets to head up > the new package's __path__, but ALL matching subdirectories will make up > the tail. > > The big advantage of this approach is that it doesn't require us to have > a special name - it's just, "Enhanced Package Imports" or some such. No > special marker files to name, either. Just, "hey, people want to put > their package contents in more than one directory, and they don't always > need an __init__.py." > > Thoughts, anyone? I like it very much: it seems elegant and minimalist. To put my comments in context, I am a non-implementor and non-guru, but also a Python old-timer, who wants to use "partitioned packages" and would like to see this done right. I.e., this is input from the peanut gallery. ;) Cheers, Steve From pje at telecommunity.com Mon Jul 11 05:57:06 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 10 Jul 2011 23:57:06 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: <20110711035731.C012E3A4100@sparrow.telecommunity.com> At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote: >There's also a performance impact on app startup time - currently most >package imports stop as soon as they hit a matching directory. Under a >"partitioned by default" scheme, all package imports (including things >like "logging" and "email" which currently get a hit in the first zip >file for the standard library) would have to scan the entirety of >sys.path just in case there are additional shards lying around. For >large applications, that additional overhead is going to add up. Darn, I missed that. That kills the idea pretty much dead right there, as it means ALL imports are massively slowed down. Crap. >So I don't think implicit partitioning is really going to fly at this >point. That said, I wouldn't oppose tweaking the partitioned package >design to eventually support dropping the requirement for explicit >".pyp(art)" files (i.e. by always placing the directory with >__init__.py at the head of the list). Nah, I don't think there's much point to that. I'm noticing, though, that the more I hear "partitioned package", the less I like it, and the more I wish I hadn't proposed it. ;-) It's fundamentally wrong, because (e.g.) peak.util is *not* a single thing that's been partitioned, *even though* it started out that way. It's just a bunch of things with a common namespace, and ISTM the name *really* ought to reflect that. Common namespace packages? Shared namespace packages? Surname packages? ;-) From ncoghlan at gmail.com Mon Jul 11 06:22:54 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jul 2011 14:22:54 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> Message-ID: On Mon, Jul 11, 2011 at 1:57 PM, P.J. Eby wrote: > I'm noticing, though, that the more I hear "partitioned package", the less I > like it, and the more I wish I hadn't proposed it. ?;-) > > It's fundamentally wrong, because (e.g.) peak.util is *not* a single thing > that's been partitioned, *even though* it started out that way. > > It's just a bunch of things with a common namespace, and ISTM the name > *really* ought to reflect that. > > Common namespace packages? ?Shared namespace packages? ?Surname packages? > ?;-) As soon as you have a flat namespace, you need to be careful to partition it correctly. We run into that fairly often with namespace collisions at the top level - failures of partitioning because (for example) a user decided to call their "experimenting with sockets" file "socket.py". So the "partitioned package" naming is a developer oriented view pointing out that hey, you're putting these files into a namespace shared with other people so think about the implications that may have for the name you choose (just as you would for a top-level package or module name or for a script symlink that is going to be placed into /usr/bin). >From a *user* point of view, they shouldn't care whether a package is partitioned or not - they'll just be treating it like an ordinary package, since (in theory) you can't tell the difference without poking around inside __path__. There may be a slight implication that all the partitions came from a single source that has been split up, but I don't think the single source implications are strong enough to invalidate the term. Anything that gets put into "peak.util" is going to relate to PEAK in *some* fashion, even if it isn't distributed as part of PEAK itself. Certainly, I haven't seen anything else suggested that comes close to this one for accuracy and mnemonic value. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Mon Jul 11 06:39:04 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 00:39:04 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> Message-ID: <20110711043932.22F8B3A4100@sparrow.telecommunity.com> At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote: >At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote: >>There's also a performance impact on app startup time - currently most >>package imports stop as soon as they hit a matching directory. Under a >>"partitioned by default" scheme, all package imports (including things >>like "logging" and "email" which currently get a hit in the first zip >>file for the standard library) would have to scan the entirety of >>sys.path just in case there are additional shards lying around. For >>large applications, that additional overhead is going to add up. > >Darn, I missed that. That kills the idea pretty much dead right >there, as it means ALL imports are massively slowed down. Crap. Hrm. I just realized WHY I missed it. I was thinking that we'd only do that in the case where you *first* find a namespace. IOW, I was proposing to only change the semantics in the case where a suitable directory is found on sys.path *before* the normal package or module. IOW, the semantics I was thinking of were: * Scan sys.path, keeping track of any subpaths found * If you hit a module with no subpaths found before it, import and finish * Otherwise, if you hit a subpath first, accumulate all subpaths and tack them on a module or package * If the matching module was a package __init__, move its subpath to the beginning of the list But I agree that it's an upward climb to sell this approach. For example, it means that you can have code later on sys.path affect code that's earlier, which seems wrong and a tad unsafe. I wish we had a way to do this that didn't require special files, and still allowed us to have package names be plain directory names, and didn't break distutils installation processes. (Distutils can install submodules without a package __init__ being included, but apart from that it forces installed directory structure to match package name structure.) Okay, I have an idea. Suppose that we reserve a special directory name, like 'pypkg'. And, if a sys.path directory contains a 'py-pkg' subdirectory, then any directory in that directory (recursively) is a package following __path__-assembly semantics. So, in order to enable new import semantics, you have to install your code to a 'py-pkg' directory under a regular sys.path directory... that's the only catch. *However*, because the distutils actually let you install packages without __init__ modules, you can trick them into installing your otherwise-normal package this way, by the simple expedient of telling the distutils your package name is 'py-pkg.foo' instead of 'foo'. (Note: this is only a hack for 2.x, and setuptools will probably be doing the dirty work of making distutils do this anyway "under the hood". For 3.x, we can hopefully assume that the 'packaging' folks will enable doing this in a somewhat saner way.) Anyway, revising the ongoing example to add the directory and drop the flag files, we get: ProxyTypes-0.9.tgz: py-pkg/peak/util/proxies.py Importing-1.10.tgz: py-pkg/peak/util/imports.py or (combined): site-packages/ (or wherever) py-pkg/ peak/ util/ imports.py proxies.py zope/ ... This approach solves several problems at once: 1. No flag files 2. Faster imports (stat instead of listdir) 3. Directory clearly identified as containing python packages 4. No need for a special name, these are just regular packages with enhanced import semantics 5. Distutils can still install it Minor downsides: * Flat is better than nested * Existing code has to move to take advantage (unless you're not going to import the code without installing it, in which case you can just tweak your setup.py and not actually move anything) Thoughts? From brett at python.org Mon Jul 11 06:49:17 2011 From: brett at python.org (Brett Cannon) Date: Sun, 10 Jul 2011 21:49:17 -0700 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711043932.22F8B3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> Message-ID: On Sun, Jul 10, 2011 at 21:39, P.J. Eby wrote: > At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote: > >> At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote: >> >>> There's also a performance impact on app startup time - currently most >>> package imports stop as soon as they hit a matching directory. Under a >>> "partitioned by default" scheme, all package imports (including things >>> like "logging" and "email" which currently get a hit in the first zip >>> file for the standard library) would have to scan the entirety of >>> sys.path just in case there are additional shards lying around. For >>> large applications, that additional overhead is going to add up. >>> >> >> Darn, I missed that. That kills the idea pretty much dead right there, as >> it means ALL imports are massively slowed down. Crap. >> > > Hrm. I just realized WHY I missed it. I was thinking that we'd only do > that in the case where you *first* find a namespace. IOW, I was proposing > to only change the semantics in the case where a suitable directory is found > on sys.path *before* the normal package or module. IOW, the semantics I was > thinking of were: > > * Scan sys.path, keeping track of any subpaths found > * If you hit a module with no subpaths found before it, import and finish > * Otherwise, if you hit a subpath first, accumulate all subpaths and tack > them on a module or package > * If the matching module was a package __init__, move its subpath to the > beginning of the list > > But I agree that it's an upward climb to sell this approach. For example, > it means that you can have code later on sys.path affect code that's > earlier, which seems wrong and a tad unsafe. > > I wish we had a way to do this that didn't require special files, and still > allowed us to have package names be plain directory names, and didn't break > distutils installation processes. (Distutils can install submodules without > a package __init__ being included, but apart from that it forces installed > directory structure to match package name structure.) > > Okay, I have an idea. > > Suppose that we reserve a special directory name, like 'pypkg'. And, if a > sys.path directory contains a 'py-pkg' subdirectory, then any directory in > that directory (recursively) is a package following __path__-assembly > semantics. > > So, in order to enable new import semantics, you have to install your code > to a 'py-pkg' directory under a regular sys.path directory... that's the > only catch. > > *However*, because the distutils actually let you install packages without > __init__ modules, you can trick them into installing your otherwise-normal > package this way, by the simple expedient of telling the distutils your > package name is 'py-pkg.foo' instead of 'foo'. > > (Note: this is only a hack for 2.x, and setuptools will probably be doing > the dirty work of making distutils do this anyway "under the hood". For > 3.x, we can hopefully assume that the 'packaging' folks will enable doing > this in a somewhat saner way.) > > Anyway, revising the ongoing example to add the directory and drop the flag > files, we get: > > ProxyTypes-0.9.tgz: > py-pkg/peak/util/proxies.py > > Importing-1.10.tgz: > py-pkg/peak/util/imports.py > > or (combined): > > site-packages/ (or wherever) > py-pkg/ > peak/ > util/ > imports.py > proxies.py > zope/ > ... > > This approach solves several problems at once: > > 1. No flag files > 2. Faster imports (stat instead of listdir) > 3. Directory clearly identified as containing python packages > 4. No need for a special name, these are just regular packages with > enhanced import semantics > 5. Distutils can still install it > > Minor downsides: > > * Flat is better than nested > * Existing code has to move to take advantage (unless you're not going to > import the code without installing it, in which case you can just tweak your > setup.py and not actually move anything) > I prefer going with a specifically named file if for any other reason than there will be less broken tools. By shifting everything into a subdirectory you prevent any pre-existing code that scans sys.path from doing anything. But with the special file approach you don't break those tools in the case of when you didn't have some package fragment farther down sys.path. Plus you can also use a specially named file instead of allowing for any file name with a specific file ending to achieve the same result (e.g., py.pkg or __init__.part). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jul 11 06:51:13 2011 From: brett at python.org (Brett Cannon) Date: Sun, 10 Jul 2011 21:51:13 -0700 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711031033.55D1F3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711031033.55D1F3A4100@sparrow.telecommunity.com> Message-ID: On Sun, Jul 10, 2011 at 20:10, P.J. Eby wrote: > At 10:34 PM 7/10/2011 -0400, P.J. Eby wrote: > >> The big advantage of this approach is that it doesn't require us to have a >> special name - it's just, "Enhanced Package Imports" or some such. No >> special marker files to name, either. Just, "hey, people want to put their >> package contents in more than one directory, and they don't always need an >> __init__.py." >> >> Thoughts, anyone? >> > > A quick follow-up; I found a thread where something vaguely similar was > discussed before: > > http://mail.python.org/**pipermail/python-dev/2006-**April/064400.html > > Various issues regarding tool support were brought up, mainly that existing > tools would not detect such packages as packages, and that doing this at the > top level was problematic because of the possibility of blocking a module > like 'string' or 'time' or some such. > > However, as it happens, with a slight adjustment to what I proposed, that > latter issue can be addressed... if *any* loadable module anywhere on > sys.path (vs. just a directory with an __init__) simply gets all the > subpaths appended to its __path__, then having a "time" directory just gets > it added to time.__path__ -- and the plain old __time__ module still gets > loaded. > I didn't read the thread, but I don't get the worry here. A 'time' package already will shadow a 'time' module if it is farther up sys.path, so this proposal in any of its current forms won't change that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Mon Jul 11 07:18:31 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 01:18:31 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> Message-ID: <20110711051855.484273A4100@sparrow.telecommunity.com> At 09:49 PM 7/10/2011 -0700, Brett Cannon wrote: >I prefer going with a specifically named file if for any other >reason than there will be less broken tools. By shifting everything >into a subdirectory you prevent any pre-existing code that scans >sys.path from doing anything. But with the special file approach you >don't break those tools in the case of when you didn't have some >package fragment farther down sys.path. I'm not sure I follow you. The approach we're explicitly recommending for new namespaces is to *not* use an __init__, so the tools will still fail unless they're updated. What you're saying is that in some cases, these tools will accidentally *seem* to work under the flag-file proposal, but will only see the contents of the first portion on sys.path. IOW, I don't think that you can claim that tools won't be broken by a flag file approach, or even that they'll really be *less* broken than by a subdirectory approach. (Also, if tools are using pkgutil's module traversal API, they won't have a problem, as it will be updated to match the import semantics -- and this should provide tool authors an incentive to start using that API, if they're not already doing so.) > Plus you can also use a specially named file instead of allowing > for any file name with a specific file ending to achieve the same > result (e.g., py.pkg or __init__.part). I'm not quite following you here; it sounds like you're talking about a single fixed filename, which won't work for the reasons described in the "rejected alternatives" section at the end of this draft: http://mail.python.org/pipermail/import-sig/2011-July/000213.html That draft proposed "DistributionName.ns" as the flag file naming pattern, and recent discussion has proposed .pypart or .pyp as alternate extensions. The present thread ("what if namespaces aren't special?") is an experiment to see if we could find a way to dispense with flag files altogether, thereby simplifying the terminology and usage, as well as saving us a listdir call or two. From martin at v.loewis.de Mon Jul 11 08:53:21 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 11 Jul 2011 08:53:21 +0200 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110710223044.E77943A4100@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> Message-ID: <4E1A9DE1.1090809@v.loewis.de> >> No - it is actually what makes it a package. There are two ways to >> declare a package: either put an __init__.py into the directory, or >> a .pyp file. > > It's too bad that (for backward compatibility reasons) we can't just use > the presence of any importable file to signify this, as is the norm for > Java, Perl, PHP, etc. I'm not sure I understand: - in Java, a package is not an importable file, but a directory, just as in Python. The major differences are: * empty directories count as packages as well; they just have to be on the CLASSPATH * you can't import packages in Java - you can only import classes - in PHP, namespaces and files are completely unrelated: http://php.net/manual/en/language.namespaces.php The files you want to use are passed to "include". include takes file names, not namespace names. Only after including the file, PHP finds out what namespace the stuff is in it imported. - in Perl, the parent package and the subpackages appear unrelated. The parent package is a file "foo.pm"; the subpackages are files in a folder "foo"; in addition, each module needs to declare its package (i.e. "package foo;" or "package foo::bar;"). This automatically makes "composite packages" possible (as the subpackages are just not considered "parts" of the parent package, AFAICT). > (AFAIK, all of them have namespacey packages by default.) Please stop calling this "composite" feature "namespacey". http://en.wikipedia.org/wiki/Namespace "In general, a namespace is a container that provides context for the identifiers (names, or technical terms, or words) it holds, and allows the disambiguation of homonym identifiers residing in different namespaces[1]." *All* Python packages are namespaces. What specific property of the package mechanism do you mean when you say "namespacey"? Regards, Martin From martin at v.loewis.de Mon Jul 11 08:59:07 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 11 Jul 2011 08:59:07 +0200 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: <4E1A9F3B.9090208@v.loewis.de> > In at least Perl, PHP, and Java, you don't have to do anything special > to merge components in a single namespace from multiple parts of the > class/include/autoload path. Not true. In all three languages, you have to declare in the module what package it belongs to. So there is something special to do. > It's for this reason that all packages being namespaces doesn't bother > me for the term. All packages *should* be namespace packages, pretty > much. It's the *non* namespaceyness of Python's default packages that's > broken, not the term. ;-) Python packages have been namespaces since day 1 (as are modules). > Actually... here's an interesting idea. Suppose that we define the > rules so that any directory containing any file with an importable > extension is a namespace package... *but*, if one of those directories > contains an __init__ module, that directory will be placed first on the > package __path__. "... is a package" (not: "namespace package") I'd go further: any directory with the package name could constitute a portion of the package. With your approach, you'd need a file with an importable extension in each portion of the "zope" package, right? Regards, Martin From ericsnowcurrently at gmail.com Mon Jul 11 09:04:14 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 11 Jul 2011 01:04:14 -0600 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711051855.484273A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> Message-ID: On Sun, Jul 10, 2011 at 11:18 PM, P.J. Eby wrote: > > The present thread ("what if namespaces aren't special?") is an experiment > to see if we could find a way to dispense with flag files altogether, > thereby simplifying the terminology and usage, as well as saving us a > listdir call or two. > Ultimately there has to be something to indicate it is a package and that it is a partition (or whatever it's called). There would be less surprises if it followed the current pattern of having a file to indicate packageness (currently only __init__.py fills this role). FWIW, I think the solution in the PEP is the clearest approach, if "partitioned by default" is not an option. And if that and the other alternate solutions are not feasible, it would be nice to have them added to the "rejected" section because they are still reasonable ideas. Still, it would be nice if we didn't have to add a new packageness indicator. -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From ericsnowcurrently at gmail.com Mon Jul 11 09:07:49 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 11 Jul 2011 01:07:49 -0600 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> Message-ID: On Sun, Jul 10, 2011 at 8:34 PM, P.J. Eby wrote: > I think one reason we're having trouble with naming and explaining this > whole concept is that, really, the current Python import system is broken, > compared to other languages. > > In at least Perl, PHP, and Java, you don't have to do anything special to > merge components in a single namespace from multiple parts of the > class/include/autoload path. ?We are thus having trouble trying to come up > with a special name to describe these, when from a more objective > perspective, what we are describing are "normal packages", with what Python > has now being "restricted to a single directory packages". > > It's for this reason that all packages being namespaces doesn't bother me > for the term. ?All packages *should* be namespace packages, pretty much. > ?It's the *non* namespaceyness of Python's default packages that's broken, > not the term. ?;-) > > If there really was a time machine, I like to think we'd go back and get > Python's package import mechanism to just work this way from the outset > (i.e. always combining shards across sys.path), and perhaps use the presence > of .py[cod]/.so files as an indication of package-ness -- if indeed an > indication is needed at all. > > Actually... ?here's an interesting idea. ?Suppose that we define the rules > so that any directory containing any file with an importable extension is a > namespace package... ?*but*, if one of those directories contains an > __init__ module, that directory will be placed first on the package > __path__. > > See, the reason why dropping the need for __init__ was previously rejected > was because it meant you could block the importing of a package later on the > path. ?*But*, if we always put the segment with __init__ first on the > __path__, then any such blocking directories would not block the "real" > package -- they'd just be accessible for imports. > > If we did that, then there would be no need for any special flag files, and > no need for special terminology. Would it be a problem to lose the filename that indicates where the portion/partition came from? -eric > ?The protocol in my draft would remain > basically the same, except for moving the __init__ module's subpath to the > front of __path__. ?And instead of globbing for *.pypart or whatever, > importers would just check whether there was a directory there at all. > > The only backward compatibility that this can break is that you can import > things you couldn't import before. ?So, if you had a foo/bar.py, with 'foo' > in a sys.path directory, and you also had a 'foo' package, AND you relied on > 'import foo.bar' raising an error, then it would no longer do so. ?But, if > you *had* a foo.bar module before, then under this scheme, 'import foo.bar' > would still import the exact same file it did before, so nothing changes. > > In other words, the first subdirectory with an __init__ gets to head up the > new package's __path__, but ALL matching subdirectories will make up the > tail. > > The big advantage of this approach is that it doesn't require us to have a > special name - it's just, "Enhanced Package Imports" or some such. ?No > special marker files to name, either. ?Just, "hey, people want to put their > package contents in more than one directory, and they don't always need an > __init__.py." > > Thoughts, anyone? > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From ncoghlan at gmail.com Mon Jul 11 09:32:17 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jul 2011 17:32:17 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> Message-ID: On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow wrote: > FWIW, I think the solution in the PEP is the clearest approach, if > "partitioned by default" is not an option. ?And if that and the other > alternate solutions are not feasible, it would be nice to have them > added to the "rejected" section because they are still reasonable > ideas. ?Still, it would be nice if we didn't have to add a new > packageness indicator. The runtime performance impact kills "partitioned by default" (i.e. no marker files needed to indicate partitioned packages). Java doesn't suffer from it since the cost is incurred at compile time, and I believe there are differences in the way Perl and PHP work that make it less of an issue there as well. PJE's latest PEP update clearly articulates the semantics for a "non-conflicting marker file" approach (modulo a name change to .pyp or .pypart instead of .ns). Allowing unmarked directories to count as packages has already been rejected in the past due to the problem of hiding package directories later on sys.path. Given the performance penalty that rules out "partitioned by default", this rejection remains in force. One question then is whether, given that a partitioned package has already been identified, should unmarked directories later on sys.path count as part of that package? My answer is no, as this is the only answer that provides consistent behaviour. Otherwise, unmarked directories may or may not be detected as part of the package depending on whether or not a partitioned package directory was found earlier on the path. As far as the specific suggestion of using a "marker directory" instead of marker files goes, I don't really see the benefit (and plenty of downsides). I put it in the same category as using a special extension on the directory name (since that's what it is, only using "/" as the separator instead of ".") and reject it for the same reasons. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Mon Jul 11 16:19:51 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 10:19:51 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711031033.55D1F3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711031033.55D1F3A4100@sparrow.telecommunity.com> Message-ID: <20110711101951.3a01f769@resist> On Jul 10, 2011, at 11:10 PM, P.J. Eby wrote: >However, as it happens, with a slight adjustment to what I proposed, that >latter issue can be addressed... if *any* loadable module anywhere on >sys.path (vs. just a directory with an __init__) simply gets all the subpaths >appended to its __path__, then having a "time" directory just gets it added >to time.__path__ -- and the plain old __time__ module still gets loaded. Does that mean I could add subpackage bits to existing modules without their "knowledge"? IOW, could I manage to add a time.foo subpackage that would be importable? I'd find that FAST (fascinating and stomach turning, to revive an old meme :). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 11 16:23:55 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 10:23:55 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711043932.22F8B3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> Message-ID: <20110711102355.7cbebacc@resist> On Jul 11, 2011, at 12:39 AM, P.J. Eby wrote: >Suppose that we reserve a special directory name, like 'pypkg'. And, if a >sys.path directory contains a 'py-pkg' subdirectory, then any directory in >that directory (recursively) is a package following __path__-assembly >semantics. I'm not entirely sold on the idea, but I do have some lovely bikeshed paint. *If* this idea were to pan out, I think __package__ would be a good directory name. Okay, it's not as unimportable itself as py-pkg, but it's still special enough to have its semantics controlled by Python. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 11 16:30:23 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 10:30:23 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> Message-ID: <20110711103023.75ebfb75@resist> On Jul 11, 2011, at 05:32 PM, Nick Coghlan wrote: >One question then is whether, given that a partitioned package has >already been identified, should unmarked directories later on sys.path >count as part of that package? My answer is no, as this is the only >answer that provides consistent behaviour. Otherwise, unmarked >directories may or may not be detected as part of the package >depending on whether or not a partitioned package directory was found >earlier on the path. This is my biggest concern. While I think PJE's proposal has some appeal, I'm worried that it will be very difficult to debug when things go wrong. I'm also concerned that introspection may not be possible without "going through Python". By this I mean, on a *nix system it would be very easy to identify all the package portions on a file system with a simple `locate *.pyp`. So I'm still in favor of the marker files approach. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 11 16:32:13 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 10:32:13 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> Message-ID: <20110711103213.761540a7@resist> On Jul 10, 2011, at 11:57 PM, P.J. Eby wrote: >I'm noticing, though, that the more I hear "partitioned package", the less I >like it, and the more I wish I hadn't proposed it. ;-) > >It's fundamentally wrong, because (e.g.) peak.util is *not* a single thing >that's been partitioned, *even though* it started out that way. > >It's just a bunch of things with a common namespace, and ISTM the name >*really* ought to reflect that. > >Common namespace packages? Shared namespace packages? Surname packages? ;-) Does 'package portions' not fit the bill? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 11 16:44:52 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 10:44:52 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> Message-ID: <20110711104452.7f296e91@resist> Another thought: what about calling these "fusion packages"? The dictionary definition of "fusion" does seem like a pretty good match for what's going on here. http://en.wiktionary.org/wiki/fusion -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Mon Jul 11 17:12:50 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 11:12:50 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110711104452.7f296e91@resist> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> <20110711104452.7f296e91@resist> Message-ID: <20110711151322.122063A414B@sparrow.telecommunity.com> At 10:44 AM 7/11/2011 -0400, Barry Warsaw wrote: >Another thought: what about calling these "fusion packages"? The dictionary >definition of "fusion" does seem like a pretty good match for what's going on >here. > >http://en.wiktionary.org/wiki/fusion Hm. The first definition on that page says, "The merging of similar or different elements into a union"... So how about "union packages"? ;-) >-Barry > > >_______________________________________________ >Import-SIG mailing list >Import-SIG at python.org >http://mail.python.org/mailman/listinfo/import-sig From barry at python.org Mon Jul 11 17:22:25 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 11 Jul 2011 11:22:25 -0400 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110711151322.122063A414B@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> <20110711104452.7f296e91@resist> <20110711151322.122063A414B@sparrow.telecommunity.com> Message-ID: <20110711112225.0c02daf0@resist> On Jul 11, 2011, at 11:12 AM, P.J. Eby wrote: >At 10:44 AM 7/11/2011 -0400, Barry Warsaw wrote: >>Another thought: what about calling these "fusion packages"? The dictionary >>definition of "fusion" does seem like a pretty good match for what's going on >>here. >> >>http://en.wiktionary.org/wiki/fusion > >Hm. The first definition on that page says, "The merging of similar or different elements into a union"... > >So how about "union packages"? ;-) I thought about that too, but i liked the sound of "fusion" better. :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Mon Jul 11 17:25:40 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 11:25:40 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> Message-ID: <20110711152605.8A3083A4100@sparrow.telecommunity.com> At 05:32 PM 7/11/2011 +1000, Nick Coghlan wrote: >On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow > wrote: > > FWIW, I think the solution in the PEP is the clearest approach, if > > "partitioned by default" is not an option. And if that and the other > > alternate solutions are not feasible, it would be nice to have them > > added to the "rejected" section because they are still reasonable > > ideas. Still, it would be nice if we didn't have to add a new > > packageness indicator. > >The runtime performance impact kills "partitioned by default" (i.e. no >marker files needed to indicate partitioned packages). Actually, partitioned by default is the *best* performance option we have for implementing this PEP, because it only uses a stat rather than a listdir. Backward compatibility is the thing that kills it. That's why I made the more recent "py-pkg/" proposal -- it has the same degree of backward compatibility as flag files does, but keeps the improved performance of partitioning by default. >One question then is whether, given that a partitioned package has >already been identified, should unmarked directories later on sys.path >count as part of that package? My answer is no, as this is the only >answer that provides consistent behaviour. Otherwise, unmarked >directories may or may not be detected as part of the package >depending on whether or not a partitioned package directory was found >earlier on the path. This is already in the PEP draft I wrote, and it's definitely the correct semantics for marker files approach. The py-pkg approach of course works similarly, since the py-pkg directory is the "marker" in that case. >As far as the specific suggestion of using a "marker directory" >instead of marker files goes, I don't really see the benefit (and >plenty of downsides). I put it in the same category as using a special >extension on the directory name (since that's what it is, only using >"/" as the separator instead of ".") and reject it for the same >reasons. What are the downsides, exactly? Special extensions don't work with the distutils; a prefix does. (I've tested it.) Most tools that look for code can be given a prefix to look for the code, but not an extension. It's *quite* a different proposition than specially-named directories -- especially since only the package root is affected, not every subpackage directory. From martin at v.loewis.de Tue Jul 12 00:24:35 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 12 Jul 2011 00:24:35 +0200 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110711151810.260883A4100@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> <4E1A9DE1.1090809@v.loewis.de> <20110711151810.260883A4100@sparrow.telecommunity.com> Message-ID: <4E1B7823.9000901@v.loewis.de> Am 11.07.2011 17:17, schrieb P.J. Eby: > At 08:53 AM 7/11/2011 +0200, Martin v. L?wis wrote: >> - in PHP, namespaces and files are completely unrelated: >> http://php.net/manual/en/language.namespaces.php >> The files you want to use are passed to "include". include takes >> file names, not namespace names. Only after including the file, >> PHP finds out what namespace the stuff is in it imported. > > I mean that in PHP, when you 'include "foo/bar"', the entire include > path is searched for foo/bar. PHP namespaces are a new feature. As you say, namespaces are new. IIUC, before that, there was a single flat namespace, and file names had no relationship to identifiers. So I don't see why the PHP include mechanism is related to "namespace packages" at all. It's more like Python's import before the introduction of packages (but even then, the modules formed namespaces, which they don't in PHP). >> *All* Python packages are namespaces. What specific property of the >> package mechanism do you mean when you say "namespacey"? > > The feature that allows a "package" to be merely an agglomeration of > child elements, rather than an entity in itself. I still think "namespace package" is a misnomer for that. In addition, even a namespace package is "an entity in itself". "import zope" will give me a proper module object bound to the name zope, with reflection, and all. I can do zope.foo = 1 if I want to. It's *technically* the case that you shouldn't have any code in it, although also technically, it would be put more stuff into __init__.py, as long as you do so for all portions of the package. Regards, Martin From pje at telecommunity.com Tue Jul 12 01:07:09 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 19:07:09 -0400 Subject: [Import-SIG] What if namespace imports weren't special? Message-ID: <20110711230729.82A813A414B@sparrow.telecommunity.com> At 08:59 AM 7/11/2011 +0200, Martin v. L?wis wrote: > > In at least Perl, PHP, and Java, you don't have to do anything special > > to merge components in a single namespace from multiple parts of the > > class/include/autoload path. > >Not true. In all three languages, you have to declare in the module what >package it belongs to. So there is something special to do. But you have to do that for *every* package, so it's not special. (i.e., by special, I meant, "in addition to what you do normally to make a package". >I'd go further: any directory with the package name could constitute >a portion of the package. With your approach, you'd need a file with >an importable extension in each portion of the "zope" package, right? For that version, yes. For the performance and compatibility reasons discussed elsewhere in this thread, though, that particular variation isn't really workable. From pje at telecommunity.com Tue Jul 12 01:06:58 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 19:06:58 -0400 Subject: [Import-SIG] PEP 382: Partial packages Message-ID: <20110711230728.A21C73A4100@sparrow.telecommunity.com> At 08:53 AM 7/11/2011 +0200, Martin v. L?wis wrote: >- in PHP, namespaces and files are completely unrelated: > http://php.net/manual/en/language.namespaces.php > The files you want to use are passed to "include". include takes > file names, not namespace names. Only after including the file, > PHP finds out what namespace the stuff is in it imported. I mean that in PHP, when you 'include "foo/bar"', the entire include path is searched for foo/bar. PHP namespaces are a new feature. >*All* Python packages are namespaces. What specific property of the >package mechanism do you mean when you say "namespacey"? The feature that allows a "package" to be merely an agglomeration of child elements, rather than an entity in itself. If you read my draft proposal, it quotes Jim Fulton's original coining of the term "namespace" package, as a contrast to what he called a "module" package. That is, some packages are self-contained entities, and others merely serve as a gathering place (namespace) for distinct entities. This is not a property of packages themselves, but of the user's intention in organizing the package. The other languages I mention all support the "namespace-only" use case better by allowing segments to be merged along their include/import paths. From pje at telecommunity.com Tue Jul 12 01:26:41 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 19:26:41 -0400 Subject: [Import-SIG] PEP 382: Partial packages Message-ID: <20110711232701.C54033A4100@sparrow.telecommunity.com> At 12:24 AM 7/12/2011 +0200, Martin v. L?wis wrote: >So I don't see why the PHP include mechanism is related to "namespace >packages" at all. Because only in Python does a search for "foo/bar" (whatever the separator) *stops* the path search when there is a match for "foo/". In PHP, Perl, and Java, searching continues along the path until the entire target is matched, regardless of whether the name parts are separated by slashes (PHP), dots (Java), or double-colons (Perl). That's why Python's behavior here is arguably a misfeature. In these other languages, there is a distinction between an entity named "x" and a *namespace* named "x::" (or "x/" or "x."), in their on-disk representations. For example, in Java, the class org.Foo is distinct from the namespace org.Foo.* in on-disk representation, as you have org/Foo.java (or .class) sitting outside the directory org/Foo/ (where any contents of org.Foo.* would be located. Similarly in Perl, Foo.pm sits outside the Foo/ directory, thereby distinguishing Foo and Foo::. Python, however, in the case where both a Foo module and Foo package exist, places the module *inside* the package. If Python were following the model of these other languages, then instead of using zope/__init__.py, we would place a zope.py in the parent directory, and when importing zope.interface, we would search the entire path for zope/ subdirectories containing an interface.py... but we wouldn't look for interface/ directories until/unless we tried to import zope.interface.foo. From pje at telecommunity.com Tue Jul 12 03:01:52 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 21:01:52 -0400 Subject: [Import-SIG] One last try: "virtual packages" Message-ID: <20110712010218.5E0873A4100@sparrow.telecommunity.com> Ok, so based on the last round of discussions about terminology, and how other languages process their path, I got to doing some thinking, and here is one last try at a high-performing, markerless, ultra-backwards-compatible, approach to this thing. I call it, "virtual packages". The implementation consists of two small *additions* to today's import semantics. These additions don't affect the performance or behavior of "standard" imports (i.e., the ones we have today), but enable certain imports that would currently fail, to succeed instead. Overall, the goal is to make package imports work more like a user coming over from languages like Perl, Java, and PHP would expect with respect to subpath searching, and creation/expansion of packages. (For instance, this proposal does away with the need to move 'foo.py' to 'foo/__init__.py' when turning a module into a package.) Anyway, this'll be my last attempt at coming up with a markerless approach, but I do hope you'll take the time to read it carefully, as it has very different semantics and performance impacts from my previous proposals, even though it may sound quite similar on the surface. In particular, this proposal is the ONLY implementation ever proposed for this PEP that has *zero* filesystem-level performance overhead for normal imports. That's right. Zip. Zero. Nada. None. Nil. The *only* cases where this proposal adds additional filesystem access overhead is in cases where, without this proposal, an ImportError would've happened under present-day import semantics. So, read it and weep... or smile, or whatever. ;-) The First Addition - "Virtual" Packages --------------------------------------- The first addition to existing import semantics is that if you try to import a submodule of a module with no __path__, then instead of treating it as a missing module, a __path__ is dynamically constructed, using namespace_subpath() calls on the parent path or sys.path. If the resulting __path__ is empty, it's an import error. Otherwise, the module's __path__ attribute is set, and the import goes ahead as if the module had been a package all along. In other words, every module is a "virtual package". If you treat it as a package, it'll become/act like one. Otherwise, it's still a module. This means that if, say, you have a bunch of directories named 'random' on sys.path (without any __init__ modules in them), importing 'random' still imports the stdlib random.py. However, if you try to import 'random.myspecialrandom', a __path__ will be constructed and used -- and if the submodule exists, it'll be imported. (And if you later add a random/myspecialrandom/ directory somewhere on sys.path, you'll be able to import random.myspecialrandom.whatever out of it, by recursive application of this "virtual package" rule.) Notice that this is very different from my previous attempt at a similar scheme. First, it doesn't introduce any performance overhead on 'import random', as the extra lookups aren't done until and unless you try to 'import random.foo'... which existing code of course will not be doing. (Second, but also important, it doesn't distort the __path__ of packages with an __init__ module, because such packages are *not* virtual packages; they retain their present day semantics.) Anyway, with this one addition, imports will now behave in a way that's friendly to users of e.g. Perl and Java, who expect the code for a module 'foo' to lie *outside* the foo/ directory, and for lookups of foo.bar or foo::bar to be searched for in foo/ subdirectories all along the respective equivalents of sys.path. You now can simply ship a single zope.py to create a virtual "zope" package -- a namespace shared by code from multiple distributions. But wait... how does that fix the filename collision problem? Aren't we still going to collide on the zope.py file? Well, that's where the second addition comes in. The Second Addition - "Pure Virtual" Packages --------------------------------------------- The second addition is that, if an import fails to find a module entirely, then once again, a virtual __path__ is assembled using namespace_subpath() calls. If the path is empty, the import fails. But if it's non-empty, an empty module is created and its __path__ is set. Voila... we now have a "pure" virtual package. (i.e. a package with no corresponding "defining" module). So, if you have a bunch of __init__-free zope/ directories on sys.path, you can freely import from them. But what happens if you DO have an __init__ module somewhere? Well, because we haven't changed the normal import semantics, the first __init__ module ends up being a defining module, and by default, its __path__ is set in the normal way, just like today. So, it's not a virtual package, it's a standard package. If you must have a defining module, you'll have to move it from zope/__init__.py to zope.py. (Either that, or use some sort of API call to explicitly request a virtual package __path__ to be set up. But the recommended way to do it would be just to move the file up a level.) Impact ------ This proposal doesn't affect performance of imports that don't ever *use* a virtual package __path__, because the setup is delayed until then. It doesn't break installation tools: distutils and setuptools both handle this without blinking. You just list your defining module (if you have one) in 'py_modules', along with any individual submodules, and you list the subpackages in 'packages'. It doesn't break code-finding tools in any way that other implementation proposals don't. (That is, ALL our proposals allow __init__ to go away, so tools are definitely going to require updating; all that differs between proposals is precisely what sort of updating is required.) Really, about the only downside I can see is that this proposal can't be implemented purely via a PEP 302 meta-importer in Python 2.x. The builtin __import__ function bails out when __path__ is missing on a parent, so it would actually require replacing __import__ in order to implement true virtual package support. (For my own personal use case for this PEP in 2.x (i.e., replacing setuptools' current mechanism with a PEP-compliant one), it's not too big a deal, though, because I was still going to need explicit registration in .pth files: no matter the mechanism used, it isn't built into the interpreter, so it still has to be bootstrapped somehow!) Anyway. Thoughts? From ericsnowcurrently at gmail.com Tue Jul 12 04:19:46 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 11 Jul 2011 20:19:46 -0600 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> Message-ID: On Mon, Jul 11, 2011 at 7:01 PM, P.J. Eby wrote: > Ok, so based on the last round of discussions about terminology, and how > other languages process their path, I got to doing some thinking, and here > is one last try at a high-performing, markerless, > ultra-backwards-compatible, approach to this thing. ?I call it, "virtual > packages". > > The implementation consists of two small *additions* to today's import > semantics. ?These additions don't affect the performance or behavior of > "standard" imports (i.e., the ones we have today), but enable certain > imports that would currently fail, to succeed instead. > > Overall, the goal is to make package imports work more like a user coming > over from languages like Perl, Java, and PHP would expect with respect to > subpath searching, and creation/expansion of packages. ?(For instance, this > proposal does away with the need to move 'foo.py' to 'foo/__init__.py' when > turning a module into a package.) > > Anyway, this'll be my last attempt at coming up with a markerless approach, > but I do hope you'll take the time to read it carefully, as it has very > different semantics and performance impacts from my previous proposals, even > though it may sound quite similar on the surface. > > In particular, this proposal is the ONLY implementation ever proposed for > this PEP that has *zero* filesystem-level performance overhead for normal > imports. > > That's right. ?Zip. ?Zero. ?Nada. ?None. ?Nil. > > The *only* cases where this proposal adds additional filesystem access > overhead is in cases where, without this proposal, an ImportError would've > happened under present-day import semantics. > > So, read it and weep... or smile, or whatever. ?;-) > > > The First Addition - "Virtual" Packages > --------------------------------------- > > The first addition to existing import semantics is that if you try to import > a submodule of a module with no __path__, then instead of treating it as a > missing module, a __path__ is dynamically constructed, using > namespace_subpath() calls on the parent path or sys.path. > > If the resulting __path__ is empty, it's an import error. ?Otherwise, the > module's __path__ attribute is set, and the import goes ahead as if the > module had been a package all along. > > In other words, every module is a "virtual package". ?If you treat it as a > package, it'll become/act like one. ?Otherwise, it's still a module. > > This means that if, say, you have a bunch of directories named 'random' on > sys.path (without any __init__ modules in them), importing 'random' still > imports the stdlib random.py. > > However, if you try to import 'random.myspecialrandom', a __path__ will be > constructed and used -- and if the submodule exists, it'll be imported. > ?(And if you later add a random/myspecialrandom/ directory somewhere on > sys.path, you'll be able to import random.myspecialrandom.whatever out of > it, by recursive application of this "virtual package" rule.) > > Notice that this is very different from my previous attempt at a similar > scheme. ?First, it doesn't introduce any performance overhead on 'import > random', as the extra lookups aren't done until and unless you try to > 'import random.foo'... ?which existing code of course will not be doing. > > (Second, but also important, it doesn't distort the __path__ of packages > with an __init__ module, because such packages are *not* virtual packages; > they retain their present day semantics.) > > Anyway, with this one addition, imports will now behave in a way that's > friendly to users of e.g. Perl and Java, who expect the code for a module > 'foo' to lie *outside* the foo/ directory, and for lookups of foo.bar or > foo::bar to be searched for in foo/ subdirectories all along the respective > equivalents of sys.path. > > You now can simply ship a single zope.py to create a virtual "zope" package > -- a namespace shared by code from multiple distributions. > > But wait... ?how does that fix the filename collision problem? ?Aren't we > still going to collide on the zope.py file? ?Well, that's where the second > addition comes in. > > > The Second Addition - "Pure Virtual" Packages > --------------------------------------------- > > The second addition is that, if an import fails to find a module entirely, > then once again, a virtual __path__ is assembled using namespace_subpath() > calls. ?If the path is empty, the import fails. ?But if it's non-empty, an > empty module is created and its __path__ is set. > > Voila... ?we now have a "pure" virtual package. ?(i.e. a package with no > corresponding "defining" module). ?So, if you have a bunch of __init__-free > zope/ directories on sys.path, you can freely import from them. > > But what happens if you DO have an __init__ module somewhere? ?Well, because > we haven't changed the normal import semantics, the first __init__ module > ends up being a defining module, and by default, its __path__ is set in the > normal way, just like today. ?So, it's not a virtual package, it's a > standard package. ?If you must have a defining module, you'll have to move > it from zope/__init__.py to zope.py. > > (Either that, or use some sort of API call to explicitly request a virtual > package __path__ to be set up. ?But the recommended way to do it would be > just to move the file up a level.) > > > Impact > ------ > > This proposal doesn't affect performance of imports that don't ever *use* a > virtual package __path__, because the setup is delayed until then. > > It doesn't break installation tools: distutils and setuptools both handle > this without blinking. ?You just list your defining module (if you have one) > in 'py_modules', along with any individual submodules, and you list the > subpackages in 'packages'. > > It doesn't break code-finding tools in any way that other implementation > proposals don't. ?(That is, ALL our proposals allow __init__ to go away, so > tools are definitely going to require updating; all that differs between > proposals is precisely what sort of updating is required.) > > Really, about the only downside I can see is that this proposal can't be > implemented purely via a PEP 302 meta-importer in Python 2.x. ?The builtin > __import__ function bails out when __path__ is missing on a parent, so it > would actually require replacing __import__ in order to implement true > virtual package support. > I have been considering porting the 3.3 importlib to 2.x, for a variety of reasons. If the implementation for "virtual namespace package portions" is done there then this shouldn't be a big deal. > (For my own personal use case for this PEP in 2.x (i.e., replacing > setuptools' current mechanism with a PEP-compliant one), it's not too big a > deal, though, because I was still going to need explicit registration in > .pth files: no matter the mechanism used, it isn't built into the > interpreter, so it still has to be bootstrapped somehow!) > > Anyway. ?Thoughts? > Cool idea. So for users the only difference is that suddenly foo.py and a foo directory (without __init__.py) can coexist/cooperate, and __init__.py becomes optional? -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From martin at v.loewis.de Tue Jul 12 08:03:59 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 12 Jul 2011 08:03:59 +0200 Subject: [Import-SIG] PEP 382: Partial packages In-Reply-To: <20110711232645.7B60D3A4100@sparrow.telecommunity.com> References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com> <4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com> <4E1A205A.3010004@v.loewis.de> <20110710223044.E77943A4100@sparrow.telecommunity.com> <4E1A9DE1.1090809@v.loewis.de> <20110711151810.260883A4100@sparrow.telecommunity.com> <4E1B7823.9000901@v.loewis.de> <20110711232645.7B60D3A4100@sparrow.telecommunity.com> Message-ID: <4E1BE3CF.1070201@v.loewis.de> > Because only in Python does a search for "foo/bar" (whatever the > separator) *stops* the path search when there is a match for "foo/". In > PHP, Perl, and Java, searching continues along the path until the entire > target is matched, regardless of whether the name parts are separated by > slashes (PHP), dots (Java), or double-colons (Perl). > > That's why Python's behavior here is arguably a misfeature. Hmm. For PHP, I don't think it's better, just different - you can *never* include a directory, so the directory is not a recognized entity in the include mechanism at all. It's the file system that is hierarchical, not the PHP namespace concept (except for new-style namespaces, which we seem to agree are unrelated). > For example, in Java, the class org.Foo is distinct from the namespace > org.Foo.* in on-disk representation, as you have org/Foo.java (or > .class) sitting outside the directory org/Foo/ (where any contents of > org.Foo.* would be located. Not true; see the attached example. Compiling foo/baz.java gives foo/baz.java:6: cannot find symbol symbol : variable foobar location: class foo.bar System.out.println(foo.bar.foobar.V); It decides that foo.bar is a class (from foo/bar.java), so foo.bar.foobar should be something inside the class (such as a nested class), and the package foo/bar is not considered anymore. If you delete bar.java, and refer to foo.bar1.V instead in baz.java, it compiles. IOW, you can't have a class and a package with the same name in Java. > Similarly in Perl, Foo.pm sits outside the Foo/ directory, thereby > distinguishing Foo and Foo::. I agree that this is better. > If Python were following the model of these other languages, then > instead of using zope/__init__.py, we would place a zope.py in the > parent directory, and when importing zope.interface, we would search the > entire path for zope/ subdirectories containing an interface.py... but > we wouldn't look for interface/ directories until/unless we tried to > import zope.interface.foo. I think it can actually work, and will propose a PEP (wording) in that direction shortly. Regards, Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.tar Type: application/x-tar Size: 10240 bytes Desc: not available URL: From ncoghlan at gmail.com Tue Jul 12 09:53:13 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jul 2011 17:53:13 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110711152605.8A3083A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> <20110711152605.8A3083A4100@sparrow.telecommunity.com> Message-ID: On Tue, Jul 12, 2011 at 1:25 AM, P.J. Eby wrote: > At 05:32 PM 7/11/2011 +1000, Nick Coghlan wrote: >> >> On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow >> wrote: >> > FWIW, I think the solution in the PEP is the clearest approach, if >> > "partitioned by default" is not an option. ?And if that and the other >> > alternate solutions are not feasible, it would be nice to have them >> > added to the "rejected" section because they are still reasonable >> > ideas. ?Still, it would be nice if we didn't have to add a new >> > packageness indicator. >> >> The runtime performance impact kills "partitioned by default" (i.e. no >> marker files needed to indicate partitioned packages). > > Actually, partitioned by default is the *best* performance option we have > for implementing this PEP, because it only uses a stat rather than a > listdir. ?Backward compatibility is the thing that kills it. By "partitioned by default" I meant the prospect of continuing to search sys.path after finding the email (etc.) directory in the stdlib zipfile. Slowing down everything in order to speed up a new feature isn't a good trade-off. >> As far as the specific suggestion of using a "marker directory" >> instead of marker files goes, I don't really see the benefit (and >> plenty of downsides). I put it in the same category as using a special >> extension on the directory name (since that's what it is, only using >> "/" as the separator instead of ".") and reject it for the same >> reasons. > > What are the downsides, exactly? ?Special extensions don't work with the > distutils; a prefix does. ?(I've tested it.) ?Most tools that look for code > can be given a prefix to look for the code, but not an extension. ?It's > *quite* a different proposition than specially-named directories -- > especially since only the package root is affected, not every subpackage > directory. >From the revised PEP draft [1] re. a directory suffix: """ The downsides, however, are also plentiful. If a package starts its life as a normal package, it must be renamed when it becomes a namespace, with the implied consequences for revision control tools. Further, there is an immense body of existing code (including the distutils and many other packaging tools) that expect a package directory's name to be the same as the package name. And porting existing Python 2.x namespace packages to Python 3 would require widespread directory renaming as well. In short, this approach would require a vastly larger number of changes to both the standard library and third-party code, for a tiny potential performance improvement and a small increase in clarity. It was therefore rejected on "practicality vs. purity" grounds.""" [1] http://mail.python.org/pipermail/import-sig/2011-July/000213.html There are plenty of practical objections to having to move files around and rename directories in order to turn an ordinary package into a partitioned package. Those objections are just as valid for the subdirectory approach as they are for a directory suffix. Dropping a marker file into the directory is simple by contrast. As someone that uses a dir tree+file list view to manage my file system, I also think the subdirectory approach would be absolutely hideous to navigate and manage. It works for __pycache__ because I don't care what's in those (most of the time) and they don't have any subdirectories. But for the actual package source code? And potentially nested for subpackages? Yuck. Awful UI design. *ding* <--- lightbulb However, the __pycache__ example did just trigger an idea that may give us the best of both worlds. 1. We use a shared marker *directory* called __package__ to indicate partitioned packages. The import system just does a stat check for __init__.py and a __package__ subdir to see if a directory is a Python package directory. 2. All the .pyp files go inside the __package__ subdir rather than being placed directly in the same directory as the package source code. No os.listdir() calls, no need to move files around to create a partitioned package, no cluttering of the main package directories with *.pyp files and distro packaging utilities are quite happy with the idea of multiple packages writing to the same directory. Thoughts? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From eric at trueblade.com Tue Jul 12 09:57:59 2011 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 12 Jul 2011 03:57:59 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> <20110711152605.8A3083A4100@sparrow.telecommunity.com> Message-ID: <4E1BFE87.6030400@trueblade.com> On 7/12/2011 3:53 AM, Nick Coghlan wrote: > *ding* <--- lightbulb > > However, the __pycache__ example did just trigger an idea that may > give us the best of both worlds. > > 1. We use a shared marker *directory* called __package__ to indicate > partitioned packages. The import system just does a stat check for > __init__.py and a __package__ subdir to see if a directory is a Python > package directory. > > 2. All the .pyp files go inside the __package__ subdir rather than > being placed directly in the same directory as the package source > code. Why would we need the .pyp files, if we already have the __package__ subdir? Isn't the existence of the subdir enough? The only reason I can think of is for mercurial, which doesn't like empty directories. But then the file could be anything, and python would never look for it. For tools like RPM the files in the subdir would need to be unique per-RPM, but I don't think that's Python's concern. Eric. From ncoghlan at gmail.com Tue Jul 12 10:07:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jul 2011 18:07:31 +1000 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> Message-ID: On Tue, Jul 12, 2011 at 11:01 AM, P.J. Eby wrote: > Anyway, this'll be my last attempt at coming up with a markerless approach, > but I do hope you'll take the time to read it carefully, as it has very > different semantics and performance impacts from my previous proposals, even > though it may sound quite similar on the surface. My first reaction is "I like it". It's the only one of the proposals put forward that will make "Why aren't my packages working?" questions on Stack Overflow go away. Boilerplate is bad, empty __init__.py files are boilerplate, and this change would let them die off gracefully. __init__.py would essentially become the package equivalent of __slots__ (i.e. declaring that the package was limited to that one directory). My second reaction is a work in progress. Going to need to think about this one for a while :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Jul 12 14:58:08 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jul 2011 22:58:08 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <4E1BFE87.6030400@trueblade.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> <20110711152605.8A3083A4100@sparrow.telecommunity.com> <4E1BFE87.6030400@trueblade.com> Message-ID: On Tue, Jul 12, 2011 at 5:57 PM, Eric V. Smith wrote: > Why would we need the .pyp files, if we already have the __package__ > subdir? Isn't the existence of the subdir enough? > > The only reason I can think of is for mercurial, which doesn't like > empty directories. But then the file could be anything, and python would > never look for it. For tools like RPM the files in the subdir would need > to be unique per-RPM, but I don't think that's Python's concern. For the reasons you say - empty directories aren't handled well by many tools and if the directory is going to have content, then *somebody* has to define the rules for playing well with others, so it may as well be us. However, I wrote this before reading PJE's last piece about virtual packages. If that idea pans out (and I personally haven't spotted any problems with it as yet) then we won't need a marker system at all, so the point will become moot. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Tue Jul 12 17:03:29 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 12 Jul 2011 11:03:29 -0400 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> Message-ID: <20110712110329.337e5d17@resist.wooz.org> It's a very interesting idea that is worth exploring. A few things come to mind: - Under this scheme it's possible for names in a module to "suddenly" appear. E.g. I could install packages that extend existing top level modules like `time` or `string`. This might be a good thing in that it gives 3rd party folks a more natural place to add things, but it could also open up a land-grab type collision if lots of people want to publish their packages as subpackage extensions to existing modules. - It's unfortunate that this will be more difficult to back port to Python 2. - It sounds like it will be more difficult to have a single code base that supports Python 2, Python3 <= 3.2, and Python 3.3. This is because __init__.py is required in the first two, but does the wrong thing (I think ;) in a post-PEP 382 Python 3.3. Adding a .pyp file that's ignored in anything that doesn't support PEP 382 would make it easier to support multiple Pythons. - This should make vendor packaging tools happy because it does seem to eliminate file collisions (duplicate directories don't matter). Let's see the PEP! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Tue Jul 12 17:34:49 2011 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 12 Jul 2011 11:34:49 -0400 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> <20110711152605.8A3083A4100@sparrow.telecommunity.com> <4E1BFE87.6030400@trueblade.com> Message-ID: <20110712153521.5F4293A4100@sparrow.telecommunity.com> At 10:58 PM 7/12/2011 +1000, Nick Coghlan wrote: >For the reasons you say - empty directories aren't handled well by >many tools and if the directory is going to have content, then >*somebody* has to define the rules for playing well with others, so it >may as well be us. > >However, I wrote this before reading PJE's last piece about virtual >packages. If that idea pans out (and I personally haven't spotted any >problems with it as yet) then we won't need a marker system at all, so >the point will become moot. True enough, but for the record, I like the idea. I had previously thought of using a marker directory, but discarded it due to the fact that it seemed to make things more complicated to set up a package. However, it occurs to me now that packaging tools can take responsibility for adding marker files to the directory, so for the end user, you just 'mkdir -p mypkg/py-pkg' or some such. (I'm not keen on __package__ as the name; I'd rather something non-importable. But that's a bikeshed for another time.) I think one other thing that we can and should do with whatever approach we end up with, is to only require one level of marker. There's virtually no benefit to restricting subpackage partitioning, because a subpackage's __path__ is always a subset of its parent's __path__. So, as soon as you get down to something that only lives in a single directory, it'll be the same as if you'd restricted it. Therefore, any drafts we do from this point forward should only require top-level markers. From pje at telecommunity.com Tue Jul 12 18:02:40 2011 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 12 Jul 2011 12:02:40 -0400 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712110329.337e5d17@resist.wooz.org> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> <20110712110329.337e5d17@resist.wooz.org> Message-ID: <20110712160304.A84103A4100@sparrow.telecommunity.com> At 11:03 AM 7/12/2011 -0400, Barry Warsaw wrote: >It's a very interesting idea that is worth exploring. A few things come to >mind: > >- Under this scheme it's possible for names in a module to "suddenly" appear. Bear in mind that you still have to actually *import* those names, so it's not like they really "suddenly" appear. And when you do import them, they'll be *modules*, not functions or classes or constants or anything. > E.g. I could install packages that extend existing top level modules like > `time` or `string`. This might be a good thing in that it gives 3rd party > folks a more natural place to add things, but it could also open up a > land-grab type collision if lots of people want to publish their > packages as > subpackage extensions to existing modules. True -- an ironic side-effect, given our intent to make it easier to *avoid* such collisions. ;-) However, given that this feature will probably NOT be available on versions <3.3 by default (see discussion below), it probably won't get *too* far out of hand. Also, because you can't add new module *contents*, there's little benefit to doing this anyway. Your users would have to do "from string.foobar import bizbaz" or "import string.foobar as foobar", anyway, so why not just make a "foobar.string" module and call it a day? I also don't think we should really advertise the ability to extend other people's packages, except maybe to say, "don't do it." We could also shut down the capability by requiring virtual packages to be declared in the module, if there is a defining module. That would actually work well with cross-version compatibility (see below) but would add an extra step when turning a module into a package. >- It's unfortunate that this will be more difficult to back port to Python 2. Well, I'm not that bothered by it. Python 2 still has its two existing ways to do this, and it's not *that* terribly hard to make an __import__ wrapper. But there are some things that can be done to make it easier. >- It sounds like it will be more difficult to have a single code base that > supports Python 2, Python3 <= 3.2, and Python 3.3. This is because > __init__.py is required in the first two, but does the wrong thing (I think > ;) in a post-PEP 382 Python 3.3. Adding a .pyp file that's ignored in > anything that doesn't support PEP 382 would make it easier to support > multiple Pythons. There's a straightforward way to solve this. Suppose we have a module called 'pep382', with a function 'make_virtual(packagename)'. In Python 2.x, setuptools will make "distributionname-version-nspkg.pth" files that just say 'import pep382; pep382.make_virtual("toplevelnamespace")', and the same solution would work for Python 3 through 3.2. (In the .egg based install case, __init__.py gets used and the older API is called, but in future setuptools that'll be a wrapper over the pep382 module.) For Python 3.3, these APIs don't need to be used, but they'll still work. They just won't be doing anything significant. You can drop use of the APIs as you drop support for older Pythons, and code targeted to 3.3+ can just do whatever. For Python < 3.3, you have to get the pep382 module installed and activated somehow in order to use the feature. However, once you do, you can use "pure virtual" packages without an __import__ hook, because a meta_path importer can catch an otherwise-failed import and set up an empty module with a __path__. IOW, the difficult part of implementing this on 2.x is only the part where you allow transitioning from a 'foo' module to a 'foo' package without changing the module. If you're using namespaces the way people mostly do now on 2.x, it works without an __import__ hook. For this reason, I suggest that the default for the backwards-compatibility module be to only handle pure-virtual and declared-virtual packages, not module-extension virtual packages. That way, the overhead remains low. (Writing __import__ in Python adds overhead to *every* import statement, vs. the relatively small and infrequent overheads added by PEP 302 hooks.) >Let's see the PEP! Martin said something about working one up along similar lines himself; I'm curious to see what his proposal is. From stephen.c.waterbury at nasa.gov Tue Jul 12 17:38:44 2011 From: stephen.c.waterbury at nasa.gov (Stephen Waterbury) Date: Tue, 12 Jul 2011 11:38:44 -0400 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712110329.337e5d17@resist.wooz.org> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> <20110712110329.337e5d17@resist.wooz.org> Message-ID: <4E1C6A84.6020002@nasa.gov> On 07/12/2011 11:03 AM, Barry Warsaw wrote: > It's a very interesting idea that is worth exploring. A few > things come to mind: > > - Under this scheme it's possible for names in a module to > "suddenly" appear. E.g. I could install packages that extend > existing top level modules like `time` or `string`. This > might be a good thing in that it gives 3rd party folks a more > natural place to add things, but it could also open up a > land-grab type collision if lots of people want to publish > their packages as subpackage extensions to existing modules. Names can suddenly appear only if installed and imported, so that doesn't seem too scary to me. As to the land-grab type collision, there are similar dangers today -- we're all consenting adults here ... ;) > - It's unfortunate that this will be more difficult to back > port to Python 2. To me the elegance seems worth the price (assuming no big gotchas that haven't been noticed yet) ... OTOH, I'm not the one doing the back porting ... "for the man who doesn't have to do it, nothing is impossible ..." ;) > - It sounds like it will be more difficult to have a single > code base that supports Python 2, Python3<= 3.2, and Python > 3.3. This is because __init__.py is required in the first two, > but does the wrong thing (I think ;) in a post-PEP 382 Python > 3.3. Adding a .pyp file that's ignored in anything that > doesn't support PEP 382 would make it easier to support > multiple Pythons. That's a consideration, but it seems a fairly simple script could add the __init__.py files to create a version of the codebase for the Python versions that require them. I might be over-simplifying, though. > - This should make vendor packaging tools happy because it does > seem to eliminate file collisions (duplicate directories don't > matter). Right. > Let's see the PEP! The peanut gallery is riveted ... :) Steve From pje at telecommunity.com Tue Jul 12 05:17:15 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 Jul 2011 23:17:15 -0400 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> Message-ID: <20110712174734.CE4493A4116@sparrow.telecommunity.com> At 08:19 PM 7/11/2011 -0600, Eric Snow wrote: >Cool idea. So for users the only difference is that suddenly foo.py >and a foo directory (without __init__.py) can coexist/cooperate, and >__init__.py becomes optional? That, and if *no* directory of the given name has an __init__.py, then the directory is a virtual package that combines portions spread across sys.path. From barry at python.org Tue Jul 12 22:03:58 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 12 Jul 2011 16:03:58 -0400 Subject: [Import-SIG] One last try: "virtual packages" In-Reply-To: <20110712160304.A84103A4100@sparrow.telecommunity.com> References: <20110712010218.5E0873A4100@sparrow.telecommunity.com> <20110712110329.337e5d17@resist.wooz.org> <20110712160304.A84103A4100@sparrow.telecommunity.com> Message-ID: <20110712160358.4f4f5398@resist.wooz.org> On Jul 12, 2011, at 12:02 PM, P.J. Eby wrote: >At 11:03 AM 7/12/2011 -0400, Barry Warsaw wrote: >>It's a very interesting idea that is worth exploring. A few things come to >>mind: >> >>- Under this scheme it's possible for names in a module to "suddenly" appear. > >Bear in mind that you still have to actually *import* those names, so it's >not like they really "suddenly" appear. And when you do import them, they'll >be *modules*, not functions or classes or constants or anything. Yeah, I was just thinking about something dumb like a typo in an import statement, but I think that's nothing realistic to be worried about. >> E.g. I could install packages that extend existing top level modules like >> `time` or `string`. This might be a good thing in that it gives 3rd party >> folks a more natural place to add things, but it could also open up a >> land-grab type collision if lots of people want to publish their > packages as >> subpackage extensions to existing modules. > >True -- an ironic side-effect, given our intent to make it easier to *avoid* >such collisions. ;-) However, given that this feature will probably NOT be >available on versions <3.3 by default (see discussion below), it probably >won't get *too* far out of hand. We'll let you eat those words in 15 years when Python 4.7 comes out. :) >Also, because you can't add new module *contents*, there's little benefit to >doing this anyway. Your users would have to do "from string.foobar import >bizbaz" or "import string.foobar as foobar", anyway, so why not just make a >"foobar.string" module and call it a day? > >I also don't think we should really advertise the ability to extend other >people's packages, except maybe to say, "don't do it." Agreed. I did want to bring this up as a side-effect of the feature. >We could also shut down the capability by requiring virtual packages to be >declared in the module, if there is a defining module. That would actually >work well with cross-version compatibility (see below) but would add an extra >step when turning a module into a package. I'd rather go the other way. IOW, leave it open by default but perhaps provide an API that allows a module to declare itself closed to submodules. I don't actually expect that to be used much, so I'm happy to call YAGNI on it. But I don't want to require a defining module for virtual packages, because that makes it less useful for vendor packagers. Generally, I think we'd prefer not to have defining modules, but when we do, we can have the defmod.py owned by exactly one vendor package, and then submodules would add dependencies on that defining module. This is actually one way we currently handle colliding __init__.py files, but it kind of sucks because it makes packaging submodules more complicated. >>- It's unfortunate that this will be more difficult to back port to Python 2. > >Well, I'm not that bothered by it. Python 2 still has its two existing ways >to do this, and it's not *that* terribly hard to make an __import__ wrapper. >But there are some things that can be done to make it easier. I'm also not entirely sure I'd want to back port this into our Python 2 versions anyway, at least not without fully understanding the performance and other implications. I'd rather spend the effort to get people switched to Python 3. :) >>- It sounds like it will be more difficult to have a single code base that >> supports Python 2, Python3 <= 3.2, and Python 3.3. This is because >> __init__.py is required in the first two, but does the wrong thing (I >> think ;) in a post-PEP 382 Python 3.3. Adding a .pyp file that's ignored >> in anything that doesn't support PEP 382 would make it easier to support >> multiple Pythons. > >There's a straightforward way to solve this. Suppose we have a module called >'pep382', with a function 'make_virtual(packagename)'. In Python 2.x, >setuptools will make "distributionname-version-nspkg.pth" files that just say >'import pep382; pep382.make_virtual("toplevelnamespace")', and the same >solution would work for Python 3 through 3.2. (In the .egg based install >case, __init__.py gets used and the older API is called, but in future >setuptools that'll be a wrapper over the pep382 module.) > >For Python 3.3, these APIs don't need to be used, but they'll still work. >They just won't be doing anything significant. You can drop use of the APIs >as you drop support for older Pythons, and code targeted to 3.3+ can just do >whatever. > >For Python < 3.3, you have to get the pep382 module installed and activated >somehow in order to use the feature. However, once you do, you can use "pure >virtual" packages without an __import__ hook, because a meta_path importer >can catch an otherwise-failed import and set up an empty module with a >__path__. > >IOW, the difficult part of implementing this on 2.x is only the part where >you allow transitioning from a 'foo' module to a 'foo' package without >changing the module. If you're using namespaces the way people mostly do now >on 2.x, it works without an __import__ hook. > >For this reason, I suggest that the default for the backwards-compatibility >module be to only handle pure-virtual and declared-virtual packages, not >module-extension virtual packages. That way, the overhead remains low. >(Writing __import__ in Python adds overhead to *every* import statement, >vs. the relatively small and infrequent overheads added by PEP 302 hooks.) I'm less concerned about the foo-module-to-foo-package case, so I'm okay with that being more difficult in Python < 3.3. >>Let's see the PEP! > >Martin said something about working one up along similar lines himself; I'm >curious to see what his proposal is. +1 -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Wed Jul 13 03:17:12 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Jul 2011 11:17:12 +1000 Subject: [Import-SIG] What if namespace imports weren't special? In-Reply-To: <20110712153521.5F4293A4100@sparrow.telecommunity.com> References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com> <20110711035731.C012E3A4100@sparrow.telecommunity.com> <20110711043932.22F8B3A4100@sparrow.telecommunity.com> <20110711051855.484273A4100@sparrow.telecommunity.com> <20110711152605.8A3083A4100@sparrow.telecommunity.com> <4E1BFE87.6030400@trueblade.com> <20110712153521.5F4293A4100@sparrow.telecommunity.com> Message-ID: On Wed, Jul 13, 2011 at 1:34 AM, P.J. Eby wrote: > At 10:58 PM 7/12/2011 +1000, Nick Coghlan wrote: >> >> For the reasons you say - empty directories aren't handled well by >> many tools and if the directory is going to have content, then >> *somebody* has to define the rules for playing well with others, so it >> may as well be us. >> >> However, I wrote this before reading PJE's last piece about virtual >> packages. If that idea pans out (and I personally haven't spotted any >> problems with it as yet) then we won't need a marker system at all, so >> the point will become moot. > > True enough, but for the record, I like the idea. ?I had previously thought > of using a marker directory, but discarded it due to the fact that it seemed > to make things more complicated to set up a package. ?However, it occurs to > me now that packaging tools can take responsibility for adding marker files > to the directory, so for the end user, you just 'mkdir -p mypkg/py-pkg' or > some such. ?(I'm not keen on __package__ as the name; I'd rather something > non-importable. ?But that's a bikeshed for another time.) I think we chose the colour of that particular bikeshed back when __pycache__ was added :) > I think one other thing that we can and should do with whatever approach we > end up with, is to only require one level of marker. ?There's virtually no > benefit to restricting subpackage partitioning, because a subpackage's > __path__ is always a subset of its parent's __path__. ?So, as soon as you > get down to something that only lives in a single directory, it'll be the > same as if you'd restricted it. ?Therefore, any drafts we do from this point > forward should only require top-level markers. +1 on having a multi-path parent imply multi-path support in subpackages. Given the significant differences between the two approaches, perhaps the marker directory idea should be written up as the "best of breed" version of PEP 382 (probably under the name "partitioned packages"), with a new PEP for the radically different "virtual packages" alternative? I think publishing the two side-by-side will actually help sell the virtual packages idea (Option A: Choose which flavour of boilerplate you want to use to make your packages work; Option B: What boilerplate?). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Wed Jul 13 19:11:48 2011 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 13 Jul 2011 13:11:48 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" Message-ID: <20110713171345.4E0673A4100@sparrow.telecommunity.com> I'd appreciate any questions, problems, clarifications, concerns, etc. so we can clean this up before we run it past Python-Dev. There are also a couple of "XXX" comments down in the "Implementation Notes" section, with open questions we need to nail down. Mostly, though, this is looking... pretty doable, actually. Thanks! PEP: XXX Title: Simplified Package Layout and Partitioning Version: $Revision$ Last-Modified: $Date$ Author: P.J. Eby Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Jul-2011 Python-Version: 3.3 Post-History: Replaces: 382 Abstract ======== This PEP proposes an enhancement to Python's package importing to: * Surprise users of other languages less, * Make it easier to convert a module into a package, and * Support dividing packages into separately installed components (ala "namespace packages", as described in PEP 382) The proposed enhancements do not change the semantics of any currently-importable directory layouts, but make it possible for packages to use a simplified directory layout (that is not importable currently). However, the proposed changes do NOT add any performance overhead to the importing of existing modules or packages, and performance for the new directory layout should be about the same as that of previous "namespace package" solutions (such as ``pkgutil.extend_path()``). The Problem =========== .. epigraph:: "Most packages are like modules. Their contents are highly interdependent and can't be pulled apart. [However,] some packages exist to provide a separate namespace. ... It should be possible to distribute sub-packages or submodules of these [namespace packages] independently." -- Jim Fulton, shortly before the release of Python 2.3 [1]_ When new users come to Python from other languages, they are often confused by Python's packaging semantics. At Google, for example, Guido received complaints from "a large crowd with pitchforks" [2]_ that the requirement for packages to contain an ``__init__`` module was a "misfeature", and should be dropped. In addition, users coming from languages like Java or Perl are sometimes confused by a difference in Python's import path searching. In most other languages that have a path mechanism to Python's ``sys.path``, a package is merely a namespace that contains modules or classes, and can thus be spread across multiple directories in the language's path. In Perl, for instance, a ``Foo::Bar`` module will be searched for in ``Foo/`` subdirectories all along the module include path, not just in the first such subdirectory found. Worse, this is not just a problem for new users: it prevents *anyone* from easily splitting a package into separately-installable components. In Perl terms, it would be as if every possible ``Net::`` module on CPAN had to be bundled up and shipped in a single tarball! For that reason, various workarounds for this latter limitation exist, circulated under the term "namespace packages". The Python standard library has provided one such workaround since Python 2.3 (via the ``pkgutil.extend_path()`` function), and the "setuptools" package provides another (via ``pkg_resources.declare_namespace()``). The workarounds themselves, however, fall prey to a *third* issue with Python's way of laying out packages in the filesystem. Because a package *must* contain an ``__init__`` module, any attempt to distribute modules for that package must necessarily include that ``__init__`` module, if those modules are to be importable. However, the very fact that each distribution of modules for a package must contain this (duplicated) ``__init__`` module, means that OS vendors who package up these module distributions must somehow handle the conflict caused by several distributions installing that ``__init__`` module to the same location in the filesystem. This led to the proposing of PEP 382 ("Namespace Packages") - a way to signal to Python's import machinery that a directory was importable, using unique filenames per module distribution. However, there was more than one downside to this approach. Performance for all import operations would be affected, and the process of designating a package became even more complex. New terminology had to be invented to explain the solution, and so on. As terminology discussions continued on the Import-SIG, it soon became apparent that the main reason it was so difficult to explain the concepts related to "namespace packages" was because Python's current way of handling packages is somewhat underpowered, when compared to other languages. That is, in other popular languages with package systems, no special term is needed to describe "namespace packages", because *all* packages generally behave in the desired fashion. Rather than being an isolated single directory with a special marker module (as in Python), packages in other languages are typically just a *union of appropriately-named directories* across the *entire* import or inclusion path. In Perl, for example, the module ``Foo`` is always found in a ``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a ``Foo/Bar.pm`` file. (In other words, there is One Obvious Way to find the location of a particular module.) This is because Perl considers a module to be *different* from a package: the package is purely a *namespace* in which other modules may reside, and is only *coincidentally* the name of a module as well. In current versions of Python, however, the module and the package are more tightly bound together. ``Foo`` is always a module -- whether it is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly linked to its submodules (if any), which *must* reside in the exact same directory where the ``__init__.py`` was found. On the positive side, this design choice means that a package is quite self-contained, and can be installed, copied, etc. as a unit just by performing an operation on the package's root directory. On the negative side, however, it is non-intuitive for beginners, and requires a more complex step to turn a module into a package. If ``Foo`` begins its life as ``Foo.py``, then it must be moved and renamed to ``Foo/__init__.py``. Conversely, if you intend to create a ``Foo.Bar`` module from the start, but have no particular module contents to put in ``Foo`` itself, then you have to create an empty and seemingly-irrelevant ``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported. (And these issues don't just confuse newcomers to the language, either: they annoy many experienced developers as well.) So, after some discussion on the Import-SIG, this PEP was created as an alternative to PEP \382, in an attempt to solve *all* of the above problems, not just the "namespace package" use cases. And, as a delightful side effect, the solution proposed in this PEP does not affect the import performance of ordinary modules or self-contained (i.e. ``__init__``-based) packages. The Solution ============ In the past, various proposals have been made to allow more intuitive approaches to package directory layout. However, most of them failed because of an apparent backward-compatibility problem. That is, if the requirement for an ``__init__`` module were simply dropped, it would open up the possibility for a directory named, say, ``string`` on ``sys.path``, to block importing of the standard library ``string`` module. Paradoxically, however, the failure of this approach does *not* arise from the elimination of the ``__init__`` requirement! Rather, the failure arises because the underlying approach takes for granted that a package is just ONE thing, instead of two. In truth, a package comprises two separate, but related entities: a module (with its own, optional contents), and a *namespace* where *other* modules or packages can be found. In current versions of Python, however, the module part (found in ``__init__``) and the namespace for submodule imports (represented by the ``__path__`` attribute) are both initialized at the same time, when the package is first imported. And, if you assume this is the *only* way to initialize these two things, then there is no way to drop the need for an ``__init__`` module, while still being backwards-compatible with existing directory layouts. After all, as soon as you encounter a directory on ``sys.path`` matching the desired name, that means you've "found" the package, and must stop searching, right? Well, not quite. A Thought Experiment -------------------- Let's hop into the time machine for a moment, and pretend we're back in the early 1990s, shortly before Python packages and ``__init__.py`` have been invented. But, imagine that we *are* familiar with Perl-like package imports, and we want to implement a similar system in Python. We'd still have Python's *module* imports to build on, so we could certainly conceive of having ``Foo.py`` as a parent ``Foo`` module for a ``Foo`` package. But how would we implement submodule and subpackage imports? Well, if we didn't have the idea of ``__path__`` attributes yet, we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``. But we'd *only* do it when someone actually tried to *import* ``Foo.Bar``. NOT when they imported ``Foo``. And *that* lets us get rid of the backwards-compatibility problem of dropping the ``__init__`` requirement, back here in 2011. How? Well, when we ``import Foo``, we're not even *looking* for ``Foo/`` directories on ``sys.path``, because we don't *care* yet. The only point at which we care, is the point when somebody tries to actually import a submodule or subpackage of ``Foo``. That means that if ``Foo`` is a standard library module (for example), and I happen to have a ``Foo`` directory on ``sys.path`` (without an ``__init__.py``, of course), then *nothing breaks*. The ``Foo`` module is still just a module, and it's still imported normally. Self-Contained vs. "Virtual" Packages ------------------------------------- Of course, in today's Python, trying to ``import Foo.Bar`` will fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a ``__path__`` attribute). So, this PEP proposes to *dynamically* create a ``__path__``, in the case where one is missing. That is, if I try to ``import Foo.Bar`` the proposed change to the import machinery will notice that the ``Foo`` module lacks a ``__path__``, and will therefore try to *build* one before proceeding. And it will do this by making a list of all the existing ``Foo/`` subdirectories of the directories listed in ``sys.path``. If the list is empty, the import will fail with ``ImportError``, just like today. But if the list is *not* empty, then it is saved in a new ``Foo.__path__`` attribute, making the module a "virtual package". That is, because it now has a valid ``__path__``, we can proceed to import submodules or subpackages in the normal way. Now, notice that this change does not affect "classic", self-contained packages that have an ``__init__`` module in them. Such packages already *have* a ``__path__`` attribute (initialized at import time) so the import machinery won't try to create another one later. This means that (for example) the standard library ``email`` package will not be affected in any way by you having a bunch of unrelated directories named ``email`` on ``sys.path``. But it *does* mean that if you want to turn your ``Foo`` module into a ``Foo`` package, all you have to do is add a ``Foo/`` directory somewhere on ``sys.path``, and start adding modules to it. But what if you only want a "namespace package"? That is, a package that is *only* a namespace for various separately-distributed submodules and subpackages? For exmaple, if you're Zope Corporation, distributing dozens of separate tools like ``zc.buildout``, each in packages under the ``zc`` namespace, you don't want to have to make and include an empty ``zc.py`` in every tool you ship. (And, if you're a Linux or other OS vendor, you don't want to deal with the package conflicts created by trying to install ten copies of ``zc.py`` to the same location!) No problem. All we have to do is make one more minor tweak to the import process: if the "classic" import process fails to find a self-contained module or package (e.g., if ``import zc`` fails to find a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a ``__path__`` by searching for all the ``zc/`` directories on ``sys.path``, and putting them in a list. If this list is empty, we raise ``ImportError``. But if it's non-empty, we create an empty ``zc`` module, and put the list in ``zc.__path__``. Congratulations: ``zc`` is now a namespace-only, "pure virtual" package! It has no module contents, but you can still import submodules and subpackages from it, regardless of where they're located on ``sys.path``. (By the way, both of these additions to the import protocol (i.e. the dynamically-added ``__path__``, and dynamically-created modules) apply recursively to child packages, using the parent package's ``__path__`` in place of ``sys.path`` as a basis for generating a child ``__path__``. This means that self-contained and virtual packages can contain each other without limitation, with the caveat that if you put a virtual package inside a self-contained one, it's gonna have a really short ``__path__``!) Backwards Compatibility and Performance --------------------------------------- Notice that these two changes *only* affect import operations that today would result in ``ImportError``. As a result, the performance of imports that do not involve virtual packages is unaffected, and potential backward compatibility issues are very restricted. Today, if you try to import submodules or subpackages from a module with no ``__path__``, it's an immediate error. And of course, if you don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path`` today, ``import zc`` would likewise fail. Thus, the only potential backwards-compatibility issues are: 1. Tools that expect package directories to have an ``__init__`` module, that expect directories without an ``__init__`` module to be unimportable, or that expect ``__path__`` attributes to be static, will not recognize virtual packages as packages. (In practice, this just means that tools will need updating to support virtual packages, e.g. by using ``pkgutil.walk_modules()`` instead of using hardcoded filesystem searches.) 2. Code that *expects* certain imports to fail may now do something unexpected. This should be fairly rare in practice, as most sane, non-test code does not import things that are expected not to exist! The biggest likely exception to the above would be when a piece of code tries to check whether some package is installed by importing it. If this is done *only* by importing a top-level module (i.e., not checking for a ``__version__`` or some other attribute), *and* there is a directory of the same name as the sought-for package on ``sys.path`` somewhere, *and* the package is not actually installed, then such code could perhaps be fooled into thinking a package is installed that really isn't. However, even in this case, the failure is more likely to be annoying than damaging; in most cases, the code will simply fail a little later on, when it actually tries to DO something with the imported (but empty) module. (And code that checks for a ``__version__`` attribute or the presence of some desired function, class, or module in the package will not see such a false positive result in the first place.) Meanwhile, tools that expect to locate packages and modules by walking a directory tree can be updated to use the existing ``pkgutil.walk_modules()`` API, and tools that need to inspect packages in memory should use the other APIs described in the `Standard Library Changes/Additions`_ section below. Specification ============= Two changes are made to the existing import process. First, the built-in ``__import__`` function must not raise an ``ImportError`` when importing a submodule of a module with no ``__path__``. Instead, it must attempt to *create* a ``__path__`` attribute for the parent module, as described in `__path__ creation`_ below. Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent package ``__path__``) fails to find a module, the import process must also attempt to create a ``__path__`` attribute for the non-existent module. If the attempt succeeds, an empty module is created and its ``__path__`` is set. Otherwise, importing fails. In both of the above cases, if a non-empty ``__path__`` is created, the name of the module whose ``__path__`` was created is added to ``sys.virtual_packages`` -- an initially-empty set of package names. Conversely, if an empty ``__path__`` results, an ``ImportError`` is immediately raised, and the module is not created or changed, nor is its name added to ``sys.virtual_packages``. (This way, code that extends ``sys.path`` at runtime can find out what virtual packages are currently imported, and thereby add any new subdirectories to those packages' ``__path__`` attributes. See `Standard Library Changes/Additions`_ below for more details.) ``__path__`` Creation --------------------- A virtual ``__path__`` is created by obtaining a PEP 302 "importer" object for each of the path entries found in ``sys.path`` (for a top-level module) or the parent ``__path__`` (for a submodule). (Note: because ``sys.meta_path`` importers are not associated with ``sys.path`` or ``__path__`` entry strings, such importers do *not* participate in this process.) Each importer is checked for a ``get_subpath()`` method, and if present, the method is called with the full name of the module the ``__path__`` is being constructed for. The return value is either a string representing a package subdirectory, or ``None`` if no such subdirectory exists. The strings returned by each importer are added to the ``__path__`` being built, in the same order as they are found. (``None`` values and missing ``get_subpath()`` methods are simply skipped.) In Python code, the algorithm would look something like this:: def get_virtual_path(modulename, parent_path=None): if parent_path is None: parent_path = sys.path path = [] for entry in parent_path: # Obtain a PEP 302 importer object - see pkgutil module importer = pkgutil.get_importer(entry) if hasattr(importer, 'get_subpath'): subpath = importer.get_subpath(modulename) if subpath is not None: path.append(subpath) return path And a function like this one should be exposed in the standard library as ``imp.get_virtual_path()``, so that people creating ``__import__`` replacements or ``sys.meta_path`` hooks can reuse it. Standard Library Changes/Additions ---------------------------------- The ``pkgutil`` module should be updated to handle this specification appropriately, including any necessary changes to ``extend_path()``, ``iter_modules()``, etc. A new generic API for calling ``get_subpath()`` on importers should be added as well. Specifically the proposed changes and additions to ``pkgutil`` are: * A new ``get_subpath(importer, fullname)`` generic function, allowing implementations to be registered for existing importers. * A new ``extend_virtual_paths(path_entry)`` function, to extend existing, already-imported virtual packages' ``__path__`` attributes to include any portions found in a new ``sys.path`` entry. This function should be called by applications extending ``sys.path`` at runtime, e.g. when adding a plugin directory or an egg to the path. The implementation of this function does a simple top-down traversal of ``sys.virtual_packages``, and performs any necessary ``get_subpath()`` calls to identify what path entries need to be added to each package's ``__path__``, given that `path_entry` has been added to ``sys.path``. (Or, in the case of sub-packages, adding a derived subpath entry, based on their parent namespace's ``__path__``.) * A new ``iter_virtual_packages(parent='')`` function to allow top-down traversal of virtual packages in ``sys.virtual_packages``, by yielding the child virtual packages of `parent`. For example, calling ``iter_virtual_packages("zope")`` might yield ``zope.app`` and ``zope.products`` (if they are imported virtual packages listed in ``sys.virtual_packages``), but **not** ``zope.foo.bar``. (This function is needed to implement ``extend_virtual_paths()``, but is also potentially useful for other code that needs to inspect imported virtual packages.) * ``ImpImporter.iter_modules()`` should be changed to also detect and yield the names of modules found in virtual packages. In addition to the above changes, the ``zipimport`` importer should have its ``iter_modules()`` implementation similarly changed. (Note: current versions of Python implement this via a shim in ``pkgutil``, so technically this is also a change to ``pkgutil``.) Last, but not least, the ``imp`` module should expose the algorithm described in the `__path__ creation`_ section above, as a ``get_virtual_path(modulename, parent_path=None)`` function, so that creators of ``__import__`` replacements can use it. Implementation Notes -------------------- For users, developers, and distributors of virtual packages: * ``sys.virtual_packages`` is allowed to contain non-existent or not-yet-imported package names; code that uses its contents should not assume that every name in this set is also present in ``sys.modules`` or that importing the name will necessarily succeed. * If you are changing a currently self-contained package into a virtual one, it's important to note that you can no longer use its ``__file__`` attribute to locate data files stored in a package directory. Instead, you must search ``__path__`` or use the ``__file__`` of a submodule adjacent to the desired files, or of a self-contained subpackage that contains the desired files. * XXX what is the __file__ of a "pure virtual" package? ``None``? Some arbitrary string? The path of the first directory with a trailing separator? No matter what we put, *some* code is going to break, but the last choice might allow some code to accidentally work. Is that good or bad? For those implementing PEP \302 importer objects: * Importers that support the ``iter_modules()`` method (used by ``pkgutil`` to locate importable modules and pacakges) and want to add virtual package support should modify their ``iter_modules()`` method so that it discovers and lists virtual packages as well as standard modules and packages. To do this, the importer should simply list all immediate subdirectory names in its jurisdiction that are valid Python identifiers. XXX This might list a lot of not-really-packages. Should we require importable contents to exist? If so, how deep do we search, and how do we prevent e.g. link loops, or traversing onto different filesystems, etc.? Ick. * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do not need to implement ``get_subpath()``, because the method is only called on importers corresponding to ``sys.path`` entries and ``__path__`` entries. If a meta importer wishes to support virtual packages, it must do so entirely within its own ``find_module()`` implementation. Unfortunately, it is unlikely that any such implementation will be able to merge its package subpaths with those of other meta importers or ``sys.path`` importers, so the meaning of "supporting virtual packages" for a meta importer is currently undefined! (However, since the intended use case for meta importers is to replace Python's normal import process entirely for some subset of modules, and the number of such importers currently implemented is quite small, this seems unlikely to be a big issue in practice.) References ========== .. [1] "namespace" vs "module" packages (mailing list thread) (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html) .. [2] "Dropping __init__.py requirement for subpackages" (http://mail.python.org/pipermail/python-dev/2006-April/064400.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From ericsnowcurrently at gmail.com Thu Jul 14 00:27:01 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 13 Jul 2011 16:27:01 -0600 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: On Wed, Jul 13, 2011 at 11:11 AM, P.J. Eby wrote: > I'd appreciate any questions, problems, clarifications, concerns, etc. so we > can clean this up before we run it past Python-Dev. ?There are also a couple > of "XXX" comments down in the "Implementation Notes" section, with open > questions we need to nail down. ?Mostly, though, this is looking... ?pretty > doable, actually. > This is cool stuff. And you have presented it really well. I have some (probably too much) feedback inline. > Thanks! > > > PEP: XXX > Title: Simplified Package Layout and Partitioning > Version: $Revision$ > Last-Modified: $Date$ > Author: P.J. Eby > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 12-Jul-2011 > Python-Version: 3.3 > Post-History: > Replaces: 382 > > Abstract > ======== > > This PEP proposes an enhancement to Python's package importing > to: > > * Surprise users of other languages less, > * Make it easier to convert a module into a package, and > * Support dividing packages into separately installed components > ?(ala "namespace packages", as described in PEP 382) > > The proposed enhancements do not change the semantics of any > currently-importable directory layouts, but make it possible for > packages to use a simplified directory layout (that is not importable > currently). > > However, the proposed changes do NOT add any performance overhead to > the importing of existing modules or packages, and performance for the > new directory layout should be about the same as that of previous > "namespace package" solutions (such as ``pkgutil.extend_path()``). > > > The Problem > =========== > > .. epigraph:: > > ? ?"Most packages are like modules. ?Their contents are highly > ? ?interdependent and can't be pulled apart. ?[However,] some > ? ?packages exist to provide a separate namespace. ... ?It should > ? ?be possible to distribute sub-packages or submodules of these > ? ?[namespace packages] independently." > > ? ?-- Jim Fulton, shortly before the release of Python 2.3 [1]_ > > > When new users come to Python from other languages, they are often > confused by Python's packaging semantics. ?At Google, for example, > Guido received complaints from "a large crowd with pitchforks" [2]_ > that the requirement for packages to contain an ``__init__`` module > was a "misfeature", and should be dropped. > > In addition, users coming from languages like Java or Perl are > sometimes confused by a difference in Python's import path searching. > > In most other languages that have a path mechanism to Python's ... mechanism similar to Python's > ``sys.path``, a package is merely a namespace that contains modules > or classes, and can thus be spread across multiple directories in > the language's path. ?In Perl, for instance, a ``Foo::Bar`` module > will be searched for in ``Foo/`` subdirectories all along the module > include path, not just in the first such subdirectory found. > > Worse, this is not just a problem for new users: it prevents *anyone* > from easily splitting a package into separately-installable > components. ?In Perl terms, it would be as if every possible ``Net::`` > module on CPAN had to be bundled up and shipped in a single tarball! > > For that reason, various workarounds for this latter limitation exist, > circulated under the term "namespace packages". ?The Python standard > library has provided one such workaround since Python 2.3 (via the > ``pkgutil.extend_path()`` function), and the "setuptools" package > provides another (via ``pkg_resources.declare_namespace()``). > > The workarounds themselves, however, fall prey to a *third* issue with > Python's way of laying out packages in the filesystem. > > Because a package *must* contain an ``__init__`` module, any attempt > to distribute modules for that package must necessarily include that > ``__init__`` module, if those modules are to be importable. > > However, the very fact that each distribution of modules for a package > must contain this (duplicated) ``__init__`` module, means that OS > vendors who package up these module distributions must somehow handle > the conflict caused by several distributions installing that > ``__init__`` module to the same location in the filesystem. > > This led to the proposing of PEP 382 ("Namespace Packages") - a way > to signal to Python's import machinery that a directory was > importable, using unique filenames per module distribution. > > However, there was more than one downside to this approach. > Performance for all import operations would be affected, and the > process of designating a package became even more complex. ?New > terminology had to be invented to explain the solution, and so on. > > As terminology discussions continued on the Import-SIG, it soon became > apparent that the main reason it was so difficult to explain the > concepts related to "namespace packages" was because Python's > current way of handling packages is somewhat underpowered, when > compared to other languages. > > That is, in other popular languages with package systems, no special > term is needed to describe "namespace packages", because *all* > packages generally behave in the desired fashion. > > Rather than being an isolated single directory with a special marker > module (as in Python), packages in other languages are typically just > a *union of appropriately-named directories* across the *entire* > import or inclusion path. > > In Perl, for example, the module ``Foo`` is always found in a > ``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a > ``Foo/Bar.pm`` file. ?(In other words, there is One Obvious Way to > find the location of a particular module.) > > This is because Perl considers a module to be *different* from a > package: the package is purely a *namespace* in which other modules > may reside, and is only *coincidentally* the name of a module as well. > > In current versions of Python, however, the module and the package are > more tightly bound together. ?``Foo`` is always a module -- whether it > is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly > linked to its submodules (if any), which *must* reside in the exact > same directory where the ``__init__.py`` was found. > > On the positive side, this design choice means that a package is quite > self-contained, and can be installed, copied, etc. as a unit just by > performing an operation on the package's root directory. > > On the negative side, however, it is non-intuitive for beginners, and > requires a more complex step to turn a module into a package. ?If > ``Foo`` begins its life as ``Foo.py``, then it must be moved and > renamed to ``Foo/__init__.py``. > > Conversely, if you intend to create a ``Foo.Bar`` module from the > start, but have no particular module contents to put in ``Foo`` > itself, then you have to create an empty and seemingly-irrelevant > ``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported. > > (And these issues don't just confuse newcomers to the language, > either: they annoy many experienced developers as well.) > > So, after some discussion on the Import-SIG, this PEP was created > as an alternative to PEP \382, in an attempt to solve *all* of the > above problems, not just the "namespace package" use cases. > > And, as a delightful side effect, the solution proposed in this PEP > does not affect the import performance of ordinary modules or > self-contained (i.e. ``__init__``-based) packages. > > > The Solution > ============ > > In the past, various proposals have been made to allow more intuitive > approaches to package directory layout. ?However, most of them failed > because of an apparent backward-compatibility problem. > > That is, if the requirement for an ``__init__`` module were simply > dropped, it would open up the possibility for a directory named, say, > ``string`` on ``sys.path``, to block importing of the standard library > ``string`` module. > > Paradoxically, however, the failure of this approach does *not* arise > from the elimination of the ``__init__`` requirement! > > Rather, the failure arises because the underlying approach takes for > granted that a package is just ONE thing, instead of two. > > In truth, a package comprises two separate, but related entities: a > module (with its own, optional contents), and a *namespace* where > *other* modules or packages can be found. > > In current versions of Python, however, the module part (found in > ``__init__``) and the namespace for submodule imports (represented > by the ``__path__`` attribute) are both initialized at the same time, > when the package is first imported. > > And, if you assume this is the *only* way to initialize these two > things, then there is no way to drop the need for an ``__init__`` > module, while still being backwards-compatible with existing directory > layouts. > > After all, as soon as you encounter a directory on ``sys.path`` > matching the desired name, that means you've "found" the package, and > must stop searching, right? > > Well, not quite. > > > A Thought Experiment > -------------------- > > Let's hop into the time machine for a moment, and pretend we're back > in the early 1990s, shortly before Python packages and ``__init__.py`` > have been invented. ?But, imagine that we *are* familiar with > Perl-like package imports, and we want to implement a similar system > in Python. > > We'd still have Python's *module* imports to build on, so we could > certainly conceive of having ``Foo.py`` as a parent ``Foo`` module > for a ``Foo`` package. ?But how would we implement submodule and > subpackage imports? > > Well, if we didn't have the idea of ``__path__`` attributes yet, > we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``. > > But we'd *only* do it when someone actually tried to *import* > ``Foo.Bar``. > > NOT when they imported ``Foo``. > > And *that* lets us get rid of the backwards-compatibility problem > of dropping the ``__init__`` requirement, back here in 2011. > > How? > > Well, when we ``import Foo``, we're not even *looking* for ``Foo/`` > directories on ``sys.path``, because we don't *care* yet. ?The only > point at which we care, is the point when somebody tries to actually > import a submodule or subpackage of ``Foo``. > > That means that if ``Foo`` is a standard library module (for example), > and I happen to have a ``Foo`` directory on ``sys.path`` (without > an ``__init__.py``, of course), then *nothing breaks*. ?The ``Foo`` > module is still just a module, and it's still imported normally. > > > Self-Contained vs. "Virtual" Packages > ------------------------------------- > > Of course, in today's Python, trying to ``import Foo.Bar`` will > fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a > ``__path__`` attribute). > > So, this PEP proposes to *dynamically* create a ``__path__``, in the > case where one is missing. > > That is, if I try to ``import Foo.Bar`` the proposed change to the > import machinery will notice that the ``Foo`` module lacks a > ``__path__``, and will therefore try to *build* one before proceeding. > > And it will do this by making a list of all the existing ``Foo/`` > subdirectories of the directories listed in ``sys.path``. > > If the list is empty, the import will fail with ``ImportError``, just > like today. ?But if the list is *not* empty, then it is saved in > a new ``Foo.__path__`` attribute, making the module a "virtual > package". > > That is, because it now has a valid ``__path__``, we can proceed > to import submodules or subpackages in the normal way. > > Now, notice that this change does not affect "classic", self-contained > packages that have an ``__init__`` module in them. ?Such packages > already *have* a ``__path__`` attribute (initialized at import time) > so the import machinery won't try to create another one later. > > This means that (for example) the standard library ``email`` package > will not be affected in any way by you having a bunch of unrelated > directories named ``email`` on ``sys.path``. > > But it *does* mean that if you want to turn your ``Foo`` module into > a ``Foo`` package, all you have to do is add a ``Foo/`` directory > somewhere on ``sys.path``, and start adding modules to it. > > But what if you only want a "namespace package"? ?That is, a package > that is *only* a namespace for various separately-distributed > submodules and subpackages? > > For exmaple, if you're Zope Corporation, distributing dozens of > separate tools like ``zc.buildout``, each in packages under the ``zc`` > namespace, you don't want to have to make and include an empty > ``zc.py`` in every tool you ship. ?(And, if you're a Linux or other > OS vendor, you don't want to deal with the package conflicts created > by trying to install ten copies of ``zc.py`` to the same location!) > > No problem. ?All we have to do is make one more minor tweak to the > import process: if the "classic" import process fails to find a > self-contained module or package (e.g., if ``import zc`` fails to find > a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a > ``__path__`` by searching for all the ``zc/`` directories on > ``sys.path``, and putting them in a list. > > If this list is empty, we raise ``ImportError``. ?But if it's > non-empty, we create an empty ``zc`` module, and put the list in > ``zc.__path__``. ?Congratulations: ``zc`` is now a namespace-only, > "pure virtual" package! ?It has no module contents, but you can still > import submodules and subpackages from it, regardless of where they're > located on ``sys.path``. > > (By the way, both of these additions to the import protocol (i.e. the > dynamically-added ``__path__``, and dynamically-created modules) > apply recursively to child packages, using the parent package's > ``__path__`` in place of ``sys.path`` as a basis for generating a > child ``__path__``. ?This means that self-contained and virtual > packages can contain each other without limitation, with the caveat > that if you put a virtual package inside a self-contained one, it's > gonna have a really short ``__path__``!) Nice. > > > Backwards Compatibility and Performance > --------------------------------------- > > Notice that these two changes *only* affect import operations that > today would result in ``ImportError``. ?As a result, the performance > of imports that do not involve virtual packages is unaffected, and > potential backward compatibility issues are very restricted. > > Today, if you try to import submodules or subpackages from a module > with no ``__path__``, it's an immediate error. ?And of course, if you > don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path`` > today, ``import zc`` would likewise fail. > > Thus, the only potential backwards-compatibility issues are: > > 1. Tools that expect package directories to have an ``__init__`` > ? module, that expect directories without an ``__init__`` module > ? to be unimportable, or that expect ``__path__`` attributes to be > ? static, will not recognize virtual packages as packages. > Should there be a way to indicate that you do not want a directory to be considered for a package (an opt-out)? Currently I can move the __init__.py out of the way and it gets ignored by import. > ? (In practice, this just means that tools will need updating to > ? support virtual packages, e.g. by using ``pkgutil.walk_modules()`` > ? instead of using hardcoded filesystem searches.) > > 2. Code that *expects* certain imports to fail may now do something > ? unexpected. ?This should be fairly rare in practice, as most sane, > ? non-test code does not import things that are expected not to > ? exist! > > The biggest likely exception to the above would be when a piece of > code tries to check whether some package is installed by importing > it. ?If this is done *only* by importing a top-level module (i.e., not > checking for a ``__version__`` or some other attribute), *and* there > is a directory of the same name as the sought-for package on > ``sys.path`` somewhere, *and* the package is not actually installed, > then such code could perhaps be fooled into thinking a package is > installed that really isn't. > > However, even in this case, the failure is more likely to be annoying > than damaging; in most cases, the code will simply fail a little later > on, when it actually tries to DO something with the imported (but > empty) module. ?(And code that checks for a ``__version__`` attribute > or the presence of some desired function, class, or module > in the package will not see such a false positive result in the > first place.) Good point. > > Meanwhile, tools that expect to locate packages and modules by > walking a directory tree can be updated to use the existing > ``pkgutil.walk_modules()`` API, and tools that need to inspect > packages in memory should use the other APIs described in the > `Standard Library Changes/Additions`_ section below. > > > Specification > ============= > > Two changes are made to the existing import process. > > First, the built-in ``__import__`` function must not raise an > ``ImportError`` when importing a submodule of a module with no > ``__path__``. ?Instead, it must attempt to *create* a ``__path__`` > attribute for the parent module, as described in `__path__ creation`_ > below. > > Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent > package ``__path__``) fails to find a module, the import process must > also attempt to create a ``__path__`` attribute for the non-existent > module. ?If the attempt succeeds, an empty module is created and its > ``__path__`` is set. ?Otherwise, importing fails. > Nice summary. > In both of the above cases, if a non-empty ``__path__`` is created, > the name of the module whose ``__path__`` was created is added to > ``sys.virtual_packages`` -- an initially-empty set of package names. I am looking at this PEP from the perspective that it may be useful, and not terribly difficult, to factor in meta importers. So if that viewpoint is invalid a good chunk of my remaining comments may be irrelevant. Also, I have been knee deep in importlib in the last few weeks, which will be painfully obvious in my feedback. I apologize in advance. Perhaps it should be a mapping from the module name to the meta importer which generated the __path__ entry for the module. If meta importers are factored in, the matching importer would be the one to determine how __path__ should change (like in the situation described for extend_virtual_paths() below). > > Conversely, if an empty ``__path__`` results, an ``ImportError`` > is immediately raised, and the module is not created or changed, nor > is its name added to ``sys.virtual_packages``. > > (This way, code that extends ``sys.path`` at runtime can find out > what virtual packages are currently imported, and thereby add any > new subdirectories to those packages' ``__path__`` attributes. ?See > `Standard Library Changes/Additions`_ below for more details.) Clear and straightforward. > > > ``__path__`` Creation > --------------------- > > A virtual ``__path__`` is created by obtaining a PEP 302 "importer" > object for each of the path entries found in ``sys.path`` (for a > top-level module) or the parent ``__path__`` (for a submodule). > > (Note: because ``sys.meta_path`` importers are not associated with > ``sys.path`` or ``__path__`` entry strings, such importers do *not* > participate in this process.) > Nice. The context for this note here make more sense than in the other versions (of the other PEP). Could the importers on sys.meta_path be given the opportunity to take control of the process, just as they get tried first when "finding" modules? Otherwise we'd be missing the means of customizing the __path__ creation process, if that is important. I don't think it would add much complexity to the implementation and would parallel the "finding" part of the import process. In importlib, the _DefaultPathFinder class handles the search across sys.path, corresponding to the default import behavior for files. It is implicitly added to the end of sys.meta_path for importlib.__import__, along with the builtin and frozen importers. For virtual __path__ creation, it would perform the process described in this section. Thus, _DefaultPathFinder would return the list of __path__ entry strings resulting when no other meta importer matches the fullname. However, if another (on sys.meta_path) matched, wouldn't the __path__ coming from _DefaultPathFinder be potentially wrong? If so, it would pay to ask each importer on sys.meta_path for the virtual __path__ and stop on the first hit. > Each importer is checked for a ``get_subpath()`` method, and if > present, the method is called with the full name of the module the > ``__path__`` is being constructed for. ?The return value is either > a string representing a package subdirectory, or ``None`` if no such > subdirectory exists. Should it return a list of strings rather than a single string? Your use of "strings" in the next sentence implies that it would. If get_path() is called at the meta_path level it would need to return a list of strings. I am guessing that importers on sys.path_hooks could too. > > The strings returned by each importer are added to the ``__path__`` > being built, in the same order as they are found. ?(``None`` values > and missing ``get_subpath()`` methods are simply skipped.) > > In Python code, the algorithm would look something like this:: > > ? ?def get_virtual_path(modulename, parent_path=None): > > ? ? ? ?if parent_path is None: > ? ? ? ? ? ?parent_path = sys.path sys.path is used here instead of as the default arg so that it gets evaluated each time? > > ? ? ? ?path = [] > > ? ? ? ?for entry in parent_path: > ? ? ? ? ? ?# Obtain a PEP 302 importer object - see pkgutil module > ? ? ? ? ? ?importer = pkgutil.get_importer(entry) > > ? ? ? ? ? ?if hasattr(importer, 'get_subpath'): > ? ? ? ? ? ? ? ?subpath = importer.get_subpath(modulename) > ? ? ? ? ? ? ? ?if subpath is not None: > ? ? ? ? ? ? ? ? ? ?path.append(subpath) > > ? ? ? ?return path > > And a function like this one should be exposed in the standard > library as ``imp.get_virtual_path()``, so that people creating Or in importlib... > ``__import__`` replacements or ``sys.meta_path`` hooks can reuse it. > > > Standard Library Changes/Additions > ---------------------------------- > > The ``pkgutil`` module should be updated to handle this > specification appropriately, including any necessary changes to > ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for > calling ``get_subpath()`` on importers should be added as well. > > Specifically the proposed changes and additions to ``pkgutil`` are: > > * A new ``get_subpath(importer, fullname)`` generic function, allowing > ?implementations to be registered for existing importers. Not that it necessarily impacts this PEP, but I'm not sure what you mean by "registered for existing importers". I am guessing that pkgutil is used to facilitate behaviors in packaging libraries, like setuptools, and that this registration is one of those behaviors. Then again I am a little dense sometimes . Don't sweat responding with an explanation. I just wanted to point out the the context of some of the pkgutil related stuff may not be obvious; and that the documentation for pkgutil doesn't help a ton to clarify that context. This may not matter for the PEP and its expected audience. > > * A new ``extend_virtual_paths(path_entry)`` function, to extend > ?existing, already-imported virtual packages' ``__path__`` attributes > ?to include any portions found in a new ``sys.path`` entry. ?This > ?function should be called by applications extending ``sys.path`` > ?at runtime, e.g. when adding a plugin directory or an egg to the > ?path. > > ?The implementation of this function does a simple top-down traversal > ?of ``sys.virtual_packages``, and performs any necessary > ?``get_subpath()`` calls to identify what path entries need to > ?be added to each package's ``__path__``, given that `path_entry` > ?has been added to ``sys.path``. ?(Or, in the case of sub-packages, > ?adding a derived subpath entry, based on their parent namespace's > ?``__path__``.) > As I already noted, this is pretty specific to the default file import mechanism rather than the more general meta import process. Maybe that's all that is needed? My sense of extending virtual paths is pretty fuzzy. > * A new ``iter_virtual_packages(parent='')`` function to allow > ?top-down traversal of virtual packages in ``sys.virtual_packages``, > ?by yielding the child virtual packages of `parent`. ?For example, > ?calling ``iter_virtual_packages("zope")`` might yield ``zope.app`` > ?and ``zope.products`` (if they are imported virtual packages listed > ?in ``sys.virtual_packages``), but **not** ``zope.foo.bar``. > ?(This function is needed to implement ``extend_virtual_paths()``, > ?but is also potentially useful for other code that needs to inspect > ?imported virtual packages.) > > * ``ImpImporter.iter_modules()`` should be changed to also detect and > ?yield the names of modules found in virtual packages. > > In addition to the above changes, the ``zipimport`` importer should > have its ``iter_modules()`` implementation similarly changed. ?(Note: > current versions of Python implement this via a shim in ``pkgutil``, > so technically this is also a change to ``pkgutil``.) > > Last, but not least, the ``imp`` module should expose the algorithm > described in the `__path__ creation`_ section above, as a > ``get_virtual_path(modulename, parent_path=None)`` function, so that > creators of ``__import__`` replacements can use it. Or this could go in importlib? I guess it depends on where the implementation happens. > > > Implementation Notes > -------------------- > > For users, developers, and distributors of virtual packages: > > * ``sys.virtual_packages`` is allowed to contain non-existent or > ?not-yet-imported package names; code that uses its contents should If it where a dict the module name could point to None, rather than to the responsible meta importer. > ?not assume that every name in this set is also present in > ?``sys.modules`` or that importing the name will necessarily succeed. Good point. > > * If you are changing a currently self-contained package into a > ?virtual one, it's important to note that you can no longer use its > ?``__file__`` attribute to locate data files stored in a package > ?directory. ?Instead, you must search ``__path__`` or use the > ?``__file__`` of a submodule adjacent to the desired files, or > ?of a self-contained subpackage that contains the desired files. Nice catch. The "optional extensions" section of PEP 302 has a bit about a get_data() method for importers. Using get_data() instead of __file__ or __path__ seems like a safer operation, much as you recommended using pkgutil.walk_modules() above. In the case of importlib (yes, it's on my mind), get_data() is already implemented for the finders surrounding _DefaultPathFinder. I am not familiar with the importers that are currently used on sys.path_importer_cache, but maybe they provide get_data() too? (a cursory look makes me think so) > > * XXX what is the __file__ of a "pure virtual" package? ?``None``? > ?Some arbitrary string? ?The path of the first directory with a > ?trailing separator? ?No matter what we put, *some* code is > ?going to break, but the last choice might allow some code to > ?accidentally work. ?Is that good or bad? > > > For those implementing PEP \302 importer objects: > > * Importers that support the ``iter_modules()`` method (used by > ?``pkgutil`` to locate importable modules and pacakges) and want to s/pacakges/packages/ > ?add virtual package support should modify their ``iter_modules()`` > ?method so that it discovers and lists virtual packages as well as > ?standard modules and packages. ?To do this, the importer should > ?simply list all immediate subdirectory names in its jurisdiction > ?that are valid Python identifiers. > > ?XXX This might list a lot of not-really-packages. ?Should we > ?require importable contents to exist? ?If so, how deep do we > ?search, and how do we prevent e.g. link loops, or traversing onto > ?different filesystems, etc.? ?Ick. > > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do > ?not need to implement ``get_subpath()``, because the method > ?is only called on importers corresponding to ``sys.path`` entries > ?and ``__path__`` entries. ?If a meta importer wishes to support > ?virtual packages, it must do so entirely within its own > ?``find_module()`` implementation. Certainly that is a simpler approach, but it seems like each find_module() implementation would end up doing it pretty much the same way, following the pattern used by the sys.path handler. However, you are probably right that handling just the sys.path stuff is good enough. > > ?Unfortunately, it is unlikely that any such implementation will be > ?able to merge its package subpaths with those of other meta > ?importers or ``sys.path`` importers, so the meaning of "supporting > ?virtual packages" for a meta importer is currently undefined! > > ?(However, since the intended use case for meta importers is to > ?replace Python's normal import process entirely for some subset of > ?modules, and the number of such importers currently implemented is > ?quite small, this seems unlikely to be a big issue in practice.) And that is why I wonder if all my blathering is relevant. Still, I'm just not sure that it would be difficult for an implementation of this PEP to handle meta importers intelligently. I would hate to discount them unnecessarily. If I'm just a vocal minority on this point I'll let it go. :) Meta importers could always be addressed in a later addition, if needed. Only a couple of things would impact that later effort: * sys.virtual_packages being a list vs. a dictionary * get_path() returning a string vs. a list And only one thing seems ambiguous when meta importers are left for later. If a module is loaded through a meta importer, which importer handles a get_path() call? When extend_virtual_paths is called, how are meta-imported modules addressed? > > > References > ========== > > .. [1] "namespace" vs "module" packages (mailing list thread) > ? (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html) > > .. [2] "Dropping __init__.py requirement for subpackages" > ? (http://mail.python.org/pipermail/python-dev/2006-April/064400.html) > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > ? Local Variables: > ? mode: indented-text > ? indent-tabs-mode: nil > ? sentence-end-double-space: t > ? fill-column: 70 > ? coding: utf-8 > ? End: > One last point: This PEP results in two ways to provide a module for a package (.py in addition to /__init__.py). However, you do offer a good distinction; __init__.py is for "self-contained" packages. Is it clear when to use which? Will __init__.py go away after a while? Will we have to start looking in two places for a package's code? Again, this is much clearer to me than the PEP 382 proposals were. And your extensive experience with packaging really shows. Sorry if any of my feedback displays my ignorance in that area too painfully. I most wholeheartedly defer to you and the rest on this list regarding most of the stuff I have said. :) Thanks for working on this. -eric p.s. if you hurry maybe you can pick up PEP 402. It's funny how those PEP numbers line up sometimes. > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From pje at telecommunity.com Thu Jul 14 01:14:18 2011 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 13 Jul 2011 19:14:18 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: <20110713231448.1CBB03A4100@sparrow.telecommunity.com> At 04:27 PM 7/13/2011 -0600, Eric Snow wrote: >This is cool stuff. And you have presented it really well. I have >some (probably too much) feedback inline. Not at all too much; I've gone ahead and taken care of the typos you mentioned. Other comments follow: >Should there be a way to indicate that you do not want a directory to >be considered for a package (an opt-out)? Currently I can move the >__init__.py out of the way and it gets ignored by import. Renaming the directory is the quick solution. If you have a tool that's looking for anything that's a package, then it'll need an exclusion option, or you'll have to rename the directory to something the tool will skip. (Ideally, tools should skip directories that aren't valid Python identifiers.) >I am looking at this PEP from the perspective that it may be useful, >and not terribly difficult, to factor in meta importers. So if that >viewpoint is invalid a good chunk of my remaining comments may be >irrelevant. Also, I have been knee deep in importlib in the last few >weeks, which will be painfully obvious in my feedback. I apologize in >advance. If you can provide a *use case* for explicitly making meta importers part of the process, then great. However, even if they are, the hooks would probably be in the form of a *different* API for meta importers, that's called with a parent path as well as a module name, that would return a list of strings rather than an individual string. The virtual path creation process would then walk the meta importers first, calling that method, until it got a non-empty list, or until it had to fall back to doing it itself (in the way described by the PEP). In the importlib case, then, you could just implement that method (say, "build_virtual_path()") on the default meta importer. (Which would also implement the virtual package fallback, or leave it to another meta-importer later on the path.) Anyway, that, as far as I can tell, is the only sane way to make meta importers participate in the virtual path building process, and IMO it's an extension that isn't really needed at the moment, and would complicate the specification in the PEP. That being said, if somebody wanted to implement the additional feature in importlib "off the books", it's not going to break anything. ;-) We can always update the PEP afterwards. Seriously, though, I suppose we could add a note saying it could be done, and should be done if anybody has use cases, but we're not spelling it out at the moment. >sys.path is used here instead of as the default arg so that it gets >evaluated each time? Yes. That's normal for ``imp`` APIs. >Or in importlib... Well, I don't really want to tie the PEP to importlib right now, and ``imp`` is the established point for exposing the machinery Python is actually using. But of course, I'm not the one doing the work. ;-) > > * A new ``get_subpath(importer, fullname)`` generic function, allowing > > implementations to be registered for existing importers. > >Not that it necessarily impacts this PEP, but I'm not sure what you >mean by "registered for existing importers". I am guessing that >pkgutil is used to facilitate behaviors in packaging libraries, like >setuptools, and that this registration is one of those behaviors. >Then again I am a little dense sometimes . I just killed that entire bullet. The truth is, it really only mattered for 2.x, where it can't really help anyway. So, I've dropped it from the spec. >As I already noted, this is pretty specific to the default file import >mechanism rather than the more general meta import process. Maybe >that's all that is needed? My sense of extending virtual paths is >pretty fuzzy. Meta importers are for implementing alternative import strategies, rather than being one more step along the way in a standard import. You could, for example, implement "pure virtual" lookup as a meta importer that sits *after* the one that does Python's normal sys.path/__path__ searching. (And that might well be the way to do it in importlib.) > > * ``sys.virtual_packages`` is allowed to contain non-existent or > > not-yet-imported package names; code that uses its contents should > >If it where a dict the module name could point to None, rather than to >the responsible meta importer. Let's see if there are any use cases for meta importer participation before we go down that route. Outside of importlib and my sketch of a 2.x implementation for PEP 382, just how many meta importers *exist* in the outside world, after nearly nine years of PEP 302 being in existence? >The "optional extensions" section of PEP 302 has a bit about a >get_data() method for importers. Using get_data() instead of __file__ >or __path__ seems like a safer operation, much as you recommended >using pkgutil.walk_modules() above. > >In the case of importlib (yes, it's on my mind), get_data() is already >implemented for the finders surrounding _DefaultPathFinder. I am not >familiar with the importers that are currently used on >sys.path_importer_cache, but maybe they provide get_data() too? (a >cursory look makes me think so) I didn't bother with explaining this much because the ``pkg_resources`` module provided by setuptools takes care of interfacing with these things to give you a friendly API for retrieving strings, streams, or filenames for module-adjacent data files. >Certainly that is a simpler approach, but it seems like each >find_module() implementation would end up doing it pretty much the >same way, following the pattern used by the sys.path handler. >However, you are probably right that handling just the sys.path stuff >is good enough. Again, if somebody can point to a meta importer that's *not* part of importlib, we can take a look at that. ;-) >* sys.virtual_packages being a list vs. a dictionary Er, it's a set, not a list. I'll change the bit that says that to highlight ``set()`` as a built-in type, vs. just the word "set". >And only one thing seems ambiguous when meta importers are left for >later. If a module is loaded through a meta importer, which importer >handles a get_path() call? When extend_virtual_paths is called, how >are meta-imported modules addressed? That's really up to the meta-importer. You're really not supposed to use meta-importers to represent import *locations*; they're for extending or replacing import *policies*. If you need locations, you make up a string to represent the location and put it in sys.path, after adding a path hook that recognizes the corresponding string. That's why the whole idea of treating a meta importer as if it were a regular path entry importer is bogus: if you wanted to just implement another search location, you should just use a path entry importer; you don't need a meta-importer at all. To put it another way, if write a meta-importer, then you really do need to consider what way you'll do ``__path__`` building, and part of the point of doing so in a meta-importer would be so that you could *change* the way it was done. So why would you want to be called as part of a protocol that you're probably going to replace, anyway? >One last point: This PEP results in two ways to provide a module for >a package (.py in addition to /__init__.py). However, you >do offer a good distinction; __init__.py is for "self-contained" >packages. Is it clear when to use which? Will __init__.py go away >after a while? Will we have to start looking in two places for a >package's code? I'll add something on that to the notes section: * While virtual packages are easy to set up and use, there is still a time and place for using self-contained packages. While it's not strictly necessary, adding an ``__init__`` module to your self-contained packages lets users of the package (and Python itself) know that *all* of the package's code will be found in that single subdirectory. In addition, it lets you define ``__all__``, expose a public API, provide a package-level docstring, and do other things that make more sense for a self-contained project than for a mere "namespace" package. From ericsnowcurrently at gmail.com Thu Jul 14 01:52:47 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 13 Jul 2011 17:52:47 -0600 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110713231448.1CBB03A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110713231448.1CBB03A4100@sparrow.telecommunity.com> Message-ID: On Wed, Jul 13, 2011 at 5:14 PM, P.J. Eby wrote: > At 04:27 PM 7/13/2011 -0600, Eric Snow wrote: > Outside of importlib and my sketch of a 2.x > implementation for PEP 382, just how many meta importers *exist* in the > outside world, after nearly nine years of PEP 302 being in existence? So true. I'm fine with taking the approach of "handling sys.path importers is good enough". Perhaps one reason I have been pressing this is because of a project I am working on that makes extensive use of meta importers. And I expect that everyone will be using it heavily within a few months of its completion . >> * sys.virtual_packages being a list vs. a dictionary > > Er, it's a set, not a list. ?I'll change the bit that says that to highlight > ``set()`` as a built-in type, vs. just the word "set". Yeah, should have been set vs. dictionary. But in the reality of how meta importers factor in here, a dictionary it need not be. >> And only one thing seems ambiguous when meta importers are left for >> later. ?If a module is loaded through a meta importer, which importer >> handles a get_path() call? ?When extend_virtual_paths is called, how >> are meta-imported modules addressed? > > That's really up to the meta-importer. ?You're really not supposed to use > meta-importers to represent import *locations*; they're for extending or > replacing import *policies*. ?If you need locations, you make up a string to > represent the location and put it in sys.path, after adding a path hook that > recognizes the corresponding string. That is a great explanation. I guess that just makes me wonder what part of the import process meta importers should respect. Is it anything goes? The onus seems to be on the meta importer to make its new import behavior as unsurprising as possible. Regardless, this doesn't have much bearing on this PEP past what you have already addressed. :) >> One last point: ?This PEP results in two ways to provide a module for >> a package (.py in addition to /__init__.py). ?However, you >> do offer a good distinction; __init__.py is for "self-contained" >> packages. ?Is it clear when to use which? ?Will __init__.py go away >> after a while? ?Will we have to start looking in two places for a >> package's code? > > I'll add something on that to the notes section: > > * While virtual packages are easy to set up and use, there is still > ?a time and place for using self-contained packages. ?While it's not > ?strictly necessary, adding an ``__init__`` module to your > ?self-contained packages lets users of the package (and Python > ?itself) know that *all* of the package's code will be found in > ?that single subdirectory. ?In addition, it lets you define > ?``__all__``, expose a public API, provide a package-level docstring, > ?and do other things that make more sense for a self-contained > ?project than for a mere "namespace" package. Sounds good. Thanks for taking the time to clarify. -eric From ncoghlan at gmail.com Thu Jul 14 05:16:50 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Jul 2011 13:16:50 +1000 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: Excellent write-up! On Thu, Jul 14, 2011 at 3:11 AM, P.J. Eby wrote: > Thus, the only potential backwards-compatibility issues are: > > 1. Tools that expect package directories to have an ``__init__`` > ? module, that expect directories without an ``__init__`` module > ? to be unimportable, or that expect ``__path__`` attributes to be > ? static, will not recognize virtual packages as packages. > > ? (In practice, this just means that tools will need updating to > ? support virtual packages, e.g. by using ``pkgutil.walk_modules()`` > ? instead of using hardcoded filesystem searches.) It's probably worth noting here that tools that do manual filesystem searches often already break when confronted with PEP 302 importers (including zipimport), so this would just be more incentive for them to do the right thing. We may also want to provide (probably in importlib) a way to walk the *potentially* importable modules on a path entry without actually importing them. While I understand the desire to focus on an import.c/pkgutil.py based implementation at this point, it's highly likely than builtin __import__ will be importlib based for 3.3. I'd be a lot happier if we stopped double-keying work and just wrote the importlib versions rather than messing with the soon-to-die C code any further. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Jul 14 20:13:52 2011 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 14 Jul 2011 14:13:52 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: <20110714181426.C4F323A4100@sparrow.telecommunity.com> At 01:16 PM 7/14/2011 +1000, Nick Coghlan wrote: >We may also want to provide (probably in importlib) a way to walk the >*potentially* importable modules on a path entry without actually >importing them. No problem. Let me just set the time machine for 2006 and add it to pkgutil instead, so it'll be in Python 2.5+. How dos the name 'iter_modules()' sound? ;-) >While I understand the desire to focus on an import.c/pkgutil.py based >implementation at this point, it's highly likely than builtin >__import__ will be importlib based for 3.3. I'd be a lot happier if we >stopped double-keying work and just wrote the importlib versions >rather than messing with the soon-to-die C code any further. Since I'm not doing the actual work for 3.3, I don't really care how it gets done. I just don't want to make the *specification* depend on that, which is why I'm saying "imp" for the API rather than importlib. When importlib goes in, after all, imp will be importing lots of other things from it anyway. ;-) That all being said, if somebody Pronounces that importlib is the right place to expose it, that's fine too. (Presumably pkgutil will need some refactoring as well, since it currently simulates some things that're probably alo implemented in importlib.) From ncoghlan at gmail.com Fri Jul 15 06:23:34 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 Jul 2011 14:23:34 +1000 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110714181426.C4F323A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110714181426.C4F323A4100@sparrow.telecommunity.com> Message-ID: On Fri, Jul 15, 2011 at 4:13 AM, P.J. Eby wrote: > At 01:16 PM 7/14/2011 +1000, Nick Coghlan wrote: >> >> We may also want to provide (probably in importlib) a way to walk the >> *potentially* importable modules on a path entry without actually >> importing them. > > No problem. ?Let me just set the time machine for 2006 and add it to pkgutil > instead, so it'll be in Python 2.5+. ?How dos the name 'iter_modules()' > sound? ?;-) For some reason I was thinking that only iterated over already loaded modules. No, I don't know why I thought that, given that sys.modules already covers that use case :P Fair enough on deferring the decision on how the importlib transition affects the public API until after it actually happens. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Sat Jul 16 09:42:43 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 16 Jul 2011 01:42:43 -0600 Subject: [Import-SIG] backport of importlib Message-ID: So I've gone ahead and written a (naive and probably incomplete) script that backports importlib to 2.x[1]. There were a few syntax differences, a couple of modules were new or had new functions, and I had to reintroduce the old-style relative imports. I'm hoping this will allow PEP 382 and the import engine to both be backported simply by running this script on the implementation out of 3.3. This was definitely a good exercise in getting familiar with the importlib implementation. -eric [1] http://pypi.python.org/pypi?:action=display&name=backport_importlib From ncoghlan at gmail.com Sat Jul 16 10:22:39 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 16 Jul 2011 18:22:39 +1000 Subject: [Import-SIG] backport of importlib In-Reply-To: References: Message-ID: On Sat, Jul 16, 2011 at 5:42 PM, Eric Snow wrote: > So I've gone ahead and written a (naive and probably incomplete) > script that backports importlib to 2.x[1]. ?There were a few syntax > differences, a couple of modules were new or had new functions, and I > had to reintroduce the old-style relative imports. I suspect several of the transforms you're applying would be handled natively by 3to2 - have you looked into using that at all? > I'm hoping this will allow PEP 382 and the import engine to both be > backported simply by running this script on the implementation out of > 3.3. ?This was definitely a good exercise in getting familiar with the > importlib implementation. Did I ever tell you about the (deliberately undocumented) standard import emulation in pkgutil? That's what runpy and a couple of other pieces of the 2.x stdlib use to get around the fact that importlib didn't exist until recently (and still doesn't exist in its full form in 2.x). (Although I guess relying on that would make it harder to use importlib itself when forward porting to 3.x) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Sun Jul 17 00:24:43 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 16 Jul 2011 16:24:43 -0600 Subject: [Import-SIG] backport of importlib In-Reply-To: References: Message-ID: On Sat, Jul 16, 2011 at 2:22 AM, Nick Coghlan wrote: > I suspect several of the transforms you're applying would be handled > natively by 3to2 - have you looked into using that at all? Yeah, I remembered it once I was mostly already done (it didn't take a long time). If the backport script has many omissions I may revisit it with 3to2. > Did I ever tell you about the (deliberately undocumented) standard > import emulation in pkgutil? That's what runpy and a couple of other > pieces of the 2.x stdlib use to get around the fact that importlib > didn't exist until recently (and still doesn't exist in its full form > in 2.x). (Although I guess relying on that would make it harder to use > importlib itself when forward porting to 3.x) That and I figure it will be easier to take advantage of things like the import engine and PEP 382 if it is a scripted backport of importlib. If I remember right from pycon, the packaging folks were looking at a similar strategy. -eric > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > From pje at telecommunity.com Mon Jul 18 16:50:28 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 18 Jul 2011 10:50:28 -0400 Subject: [Import-SIG] So... should we do this thing? Message-ID: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> What do y'all think? Should we submit the PEP, and run it by Python-Dev? Anybody have any changes, questions, etc.? Perhaps most important: are there any people willing and able to do the implementation for Python 3? ;-) From barry at python.org Mon Jul 18 18:17:26 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Jul 2011 12:17:26 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: <20110718121726.123e5b44@resist.wooz.org> I finally had a chance to read this. TL;DR: +1. I have a few quibbles about typos and grammar, but let's ignore that for now. I have two questions of substance at this point. 1. Sometimes, packages can have non-importable data directories, e.g. foo/test/data. Where foo.test would be an importable subpackage, foo.test.data should not be. Today we can just omit the __init__.py from foo/test/data. Under the proposed regime there would IIUC, be no way to prevent foo.test.data from being a subpackage. It's entirely possible that foo/test/data would have .py files in it which would themselves be importable. Is this a bad thing? If so, do we need some mechanism to prevent recursion into some subdirectories? 2. The __file__ issue. My gut tells me that pure virtual modules should have None as their __file__. It seems wrong to use anything else, and your "accidentally work" observation is not calming. ;) The inability to use __file__ to find data files is somewhat troubling though. Let's say we want to find the foo/test/data subdir above, and `foo` is pure-virtual, while `test` is an __init__.py-less package. I'm fine not being able to use foo.__file__, but I will probably want to use `os.path.join(foo.test.__file__, 'data')`. Will that work? What would foo.test's __file__ be? The `foo/test` directory perhaps? Of course there could be multiple `foo/test` directories, so this is probably why your suggesting to search foo.test.__path__ instead. I'd actually be okay with that, *if* pkg_resources will be updated to handle this case. In general, we've been recommending people use pkg_resources anyway (wasn't there a push to move part of this package into the stdlib?). I'll read up on the rest of the thread now, but I think the PEP holds up well and makes a convincing argument. I think it's certainly worthy of posting to python-dev to see if anybody else can shoot holes in it, or come up with useful solutions to open questions. I'll be very interested to see Guido's reaction to it. :) Thanks for taking this on PJE. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 18 18:18:24 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Jul 2011 12:18:24 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: <20110718121824.28db7f1e@resist.wooz.org> On Jul 14, 2011, at 01:16 PM, Nick Coghlan wrote: >While I understand the desire to focus on an import.c/pkgutil.py based >implementation at this point, it's highly likely than builtin >__import__ will be importlib based for 3.3. I'd be a lot happier if we >stopped double-keying work and just wrote the importlib versions >rather than messing with the soon-to-die C code any further. Is that really true? I keep hearing conflicting estimates about that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 18 18:29:31 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Jul 2011 12:29:31 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> Message-ID: <20110718122931.6bb07aab@resist.wooz.org> One other quick thought about __file__. A common use case for it is for debugging purposes. E.g. a user may say "I'm getting a different foo package than I expected" and that's causing problems with their application. Commonly, we'll say to run this: $ python -c "import foo; print foo.__file__" to prove where they got it from. While I still think this makes sense to print None for pure-virtuals, I might still want to know something about where on the file system these things live. I suppose that if `foo` were a pure-virtual, then this would be better diagnostics: $ python -c "import foo; print foo.__path__" since it would tell us what file system paths contributed to the creation of `foo` as a pure virtual package. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 18 18:32:29 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Jul 2011 12:32:29 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110713231448.1CBB03A4100@sparrow.telecommunity.com> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110713231448.1CBB03A4100@sparrow.telecommunity.com> Message-ID: <20110718123229.32add477@resist.wooz.org> On Jul 13, 2011, at 07:14 PM, P.J. Eby wrote: >At 04:27 PM 7/13/2011 -0600, Eric Snow wrote: >>Should there be a way to indicate that you do not want a directory to >>be considered for a package (an opt-out)? Currently I can move the >>__init__.py out of the way and it gets ignored by import. > >Renaming the directory is the quick solution. If you have a tool that's >looking for anything that's a package, then it'll need an exclusion option, >or you'll have to rename the directory to something the tool will skip. >(Ideally, tools should skip directories that aren't valid Python >identifiers.) I agree that tools should skip directories that aren't valid identifiers. Maybe that's good enough, but I half suspect that the opt-out requirement will come up often in future discussions. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jul 18 18:44:44 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Jul 2011 12:44:44 -0400 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> Message-ID: <20110718124444.0ca8b47b@resist.wooz.org> On Jul 18, 2011, at 10:50 AM, P.J. Eby wrote: >What do y'all think? Should we submit the PEP, and run it by Python-Dev? >Anybody have any changes, questions, etc.? Yes, to all questions! (See my other follow up). >Perhaps most important: are there any people willing and able to do the >implementation for Python 3? ;-) Possibly so; I might even get some Official Work Time for it. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ericsnowcurrently at gmail.com Mon Jul 18 18:55:57 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 18 Jul 2011 10:55:57 -0600 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> Message-ID: On Mon, Jul 18, 2011 at 8:50 AM, P.J. Eby wrote: > What do y'all think? ?Should we submit the PEP, and run it by Python-Dev? > ?Anybody have any changes, questions, etc.? > > Perhaps most important: are there any people willing and able to do the > implementation for Python 3? ?;-) I could take a stab at an importlib version. -eric > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From brett at python.org Mon Jul 18 19:01:36 2011 From: brett at python.org (Brett Cannon) Date: Mon, 18 Jul 2011 10:01:36 -0700 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110718121824.28db7f1e@resist.wooz.org> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110718121824.28db7f1e@resist.wooz.org> Message-ID: On Mon, Jul 18, 2011 at 09:18, Barry Warsaw wrote: > On Jul 14, 2011, at 01:16 PM, Nick Coghlan wrote: > > >While I understand the desire to focus on an import.c/pkgutil.py based > >implementation at this point, it's highly likely than builtin > >__import__ will be importlib based for 3.3. I'd be a lot happier if we > >stopped double-keying work and just wrote the importlib versions > >rather than messing with the soon-to-die C code any further. > > Is that really true? I keep hearing conflicting estimates about that. > It's as true as I make it. =) And it's my #1 Python 3.3 project so I am going to do my damnedest to make it happen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jul 19 00:07:13 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 Jul 2011 08:07:13 +1000 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110718121726.123e5b44@resist.wooz.org> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110718121726.123e5b44@resist.wooz.org> Message-ID: On Tue, Jul 19, 2011 at 2:17 AM, Barry Warsaw wrote: > 2. The __file__ issue. ?My gut tells me that pure virtual modules should have > ? None as their __file__. ?It seems wrong to use anything else, and your > ? "accidentally work" observation is not calming. ;) > > ? The inability to use __file__ to find data files is somewhat troubling > ? though. ?Let's say we want to find the foo/test/data subdir above, and > ? `foo` is pure-virtual, while `test` is an __init__.py-less package. > > ? I'm fine not being able to use foo.__file__, but I will probably want to > ? use `os.path.join(foo.test.__file__, 'data')`. ?Will that work? ?What would > ? foo.test's __file__ be? ?The `foo/test` directory perhaps? ?Of course there > ? could be multiple `foo/test` directories, so this is probably why your > ? suggesting to search foo.test.__path__ instead. > > ? I'd actually be okay with that, *if* pkg_resources will be updated to > ? handle this case. ?In general, we've been recommending people use > ? pkg_resources anyway (wasn't there a push to move part of this package into > ? the stdlib?). pkgutil.get_data() needs to be updated to handle this case, so retrieving the contents of a specific file in the directory above could be written as either of the following: pkgutil.get_data(foo, 'test/data/file.dat') pkgutil.get_data(foo.test, 'data/file.dat') The question of PEP 302 and listing *available* data files (and other directory-style or lazy data access I/O operations) remains open (independent of the changes in this PEP). Note that os.path.join based approaches already break as soon as you put the package and data files in a zipfile. In reality, I believe people should be using the appropriate packaging APIs so that source files and data files may be deployed to distinct locations. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Tue Jul 19 00:49:23 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 18 Jul 2011 18:49:23 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: <20110718121726.123e5b44@resist.wooz.org> References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110718121726.123e5b44@resist.wooz.org> Message-ID: <20110718225006.5A3DE3A40AA@sparrow.telecommunity.com> At 12:17 PM 7/18/2011 -0400, Barry Warsaw wrote: >1. Sometimes, packages can have non-importable data directories, > e.g. foo/test/data. Where foo.test would be an importable subpackage, > foo.test.data should not be. Today we can just omit the __init__.py from > foo/test/data. Under the proposed regime there would IIUC, be no way to > prevent foo.test.data from being a subpackage. It's entirely > possible that > foo/test/data would have .py files in it which would themselves be > importable. Is this a bad thing? Why would it be? >If so, do we need some mechanism to > prevent recursion into some subdirectories? You could rename the subdirectory, I suppose. >2. The __file__ issue. My gut tells me that pure virtual modules should have > None as their __file__. It seems wrong to use anything else, and your > "accidentally work" observation is not calming. ;) Heh. ;-) > The inability to use __file__ to find data files is somewhat troubling > though. Let's say we want to find the foo/test/data subdir above, and > `foo` is pure-virtual, while `test` is an __init__.py-less package. > > I'm fine not being able to use foo.__file__, but I will probably want to > use `os.path.join(foo.test.__file__, 'data')`. Currently, you'd actually join to the dirname() of the __file__, not the plain file. Thus, putting a directory name with a trailing '/' in __file__ would then make the current incantation work for that case, as long as you were fine with looking in the *first* directory where the file was. However, I'm not as keen on that as a general solution, simply because if you add a 'foo/test.py', then the __file__ will change such that a different incantation is required to find the directory. > Will that work? What would > foo.test's __file__ be? The `foo/test` directory perhaps? Of > course there > could be multiple `foo/test` directories, so this is probably why your > suggesting to search foo.test.__path__ instead. > > I'd actually be okay with that, *if* pkg_resources will be updated to > handle this case. In general, we've been recommending people use > pkg_resources anyway (wasn't there a push to move part of this > package into > the stdlib?). pkg_resources says not to use a namespace package as your target for a lookup, but instead to always use a self-contained package or a module that's adjacent to what you're looking for, for this very reason. There's really no change here. >I'll read up on the rest of the thread now, but I think the PEP holds up well >and makes a convincing argument. I think it's certainly worthy of posting to >python-dev to see if anybody else can shoot holes in it, or come up with >useful solutions to open questions. I'll be very interested to see Guido's >reaction to it. :) Me too. ;-) From pje at telecommunity.com Tue Jul 19 00:52:08 2011 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 18 Jul 2011 18:52:08 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110718121726.123e5b44@resist.wooz.org> Message-ID: <20110718225244.551E93A40AA@sparrow.telecommunity.com> At 08:07 AM 7/19/2011 +1000, Nick Coghlan wrote: >On Tue, Jul 19, 2011 at 2:17 AM, Barry Warsaw wrote: > > 2. The __file__ issue. My gut tells me that pure virtual modules > should have > > None as their __file__. It seems wrong to use anything else, and your > > "accidentally work" observation is not calming. ;) > > > > The inability to use __file__ to find data files is somewhat troubling > > though. Let's say we want to find the foo/test/data subdir above, and > > `foo` is pure-virtual, while `test` is an __init__.py-less package. > > > > I'm fine not being able to use foo.__file__, but I will probably want to > > use `os.path.join(foo.test.__file__, 'data')`. Will that > work? What would > > foo.test's __file__ be? The `foo/test` directory perhaps? Of > course there > > could be multiple `foo/test` directories, so this is probably why your > > suggesting to search foo.test.__path__ instead. > > > > I'd actually be okay with that, *if* pkg_resources will be updated to > > handle this case. In general, we've been recommending people use > > pkg_resources anyway (wasn't there a push to move part of this > package into > > the stdlib?). > >pkgutil.get_data() needs to be updated to handle this case, so >retrieving the contents of a specific file in the directory above >could be written as either of the following: > >pkgutil.get_data(foo, 'test/data/file.dat') >pkgutil.get_data(foo.test, 'data/file.dat') Really, these should be done relative to either a module or a self-contained package, unless we want to modify these things to search the __path__ -- and I'm not entirely sure that we do. >The question of PEP 302 and listing *available* data files (and other >directory-style or lazy data access I/O operations) remains open >(independent of the changes in this PEP). Note that os.path.join based >approaches already break as soon as you put the package and data files >in a zipfile. > >In reality, I believe people should be using the appropriate packaging >APIs so that source files and data files may be deployed to distinct >locations. Indeed. From eric at trueblade.com Wed Jul 20 02:46:26 2011 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 19 Jul 2011 20:46:26 -0400 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> Message-ID: <4E262562.5070606@trueblade.com> On 7/18/2011 10:50 AM, P.J. Eby wrote: > What do y'all think? Should we submit the PEP, and run it by > Python-Dev? Anybody have any changes, questions, etc.? I think you should submit the PEP and run it by python-dev. I'm curious to hear what Martin and others think. I like the idea of doing something more radical that not only allows for "namespace packages" or whatever term we settle on, but simplifies how we explain packages. I think this proposal does that, at least for people new to Python. For oldsters like me, it will take some time to wrap my head around it. > Perhaps most important: are there any people willing and able to do the > implementation for Python 3? ;-) I'm willing. Eric. From ncoghlan at gmail.com Wed Jul 20 04:03:48 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 20 Jul 2011 12:03:48 +1000 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> Message-ID: On Tue, Jul 19, 2011 at 12:50 AM, P.J. Eby wrote: > What do y'all think? ?Should we submit the PEP, and run it by Python-Dev? > ?Anybody have any changes, questions, etc.? I think it's ready for wider distribution. I want to see how many brains we can melt as people come to grips with the long term implications :) Some day virtual packages may even become the norm, with self-contained package directories being an app startup time optimisation. > Perhaps most important: are there any people willing and able to do the > implementation for Python 3? ?;-) I have plenty on my plate for 3.3 already, but I'll definitely help out with reviewing submitted patches. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Wed Jul 20 22:02:47 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 Jul 2011 14:02:47 -0600 Subject: [Import-SIG] PEP 402 implementation Message-ID: Last night I had a chance to get started on an implementation for the PEP. I'm taking the importlib route. Before I do much more I wanted to check on a couple of things with this group. First of all, I don't want to go to much effort here if others are already focused on the implementation, particularly since I'm sure all of you would do a better job than I would. I already feel like I have butt in on the work Barry, Eric, and crew were getting started. If someone is already going to take care of the implementation please let me know. In case you haven't noticed, Python is my first foray into an open-source project, and I've only been involved since the pycon sprints (been using Python exclusively for 5 years though). So, I am still feeling out the mechanics of how people cooperate on this sort of stuff. Secondly, regardless of importlib or import.c or whatever, the sys module will need to have "virtual_packages" added right? I stuck that code in import.c next to where sys.meta_path and others get initialized [1]. Is that the right place to do it? Should it go in sysmodule.c instead? Thanks, -eric [1] http://hg.python.org/cpython/file/default/Python/import.c#l204 From brett at python.org Wed Jul 20 22:08:42 2011 From: brett at python.org (Brett Cannon) Date: Wed, 20 Jul 2011 13:08:42 -0700 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 13:02, Eric Snow wrote: > Last night I had a chance to get started on an implementation for the > PEP. I'm taking the importlib route. Before I do much more I wanted > to check on a couple of things with this group. > Obviously feel free to ask me questions (publicly or privately) if anything in the importlib code is an issue for you (I know its structure for bootstrapping reasons is a bit odd). > > First of all, I don't want to go to much effort here if others are > already focused on the implementation, particularly since I'm sure all > of you would do a better job than I would. I already feel like I have > butt in on the work Barry, Eric, and crew were getting started. > If someone is already going to take care of the implementation please > let me know. > I really doubt anyone has jumped into this as much as you have, Eric. =) You can also always do it on bitbucket or somewhere so that others can collaborate. I believe there is even a cpython mirror there so that should make it easy to fork and pull in updates. > > In case you haven't noticed, Python is my first foray into an > open-source project, and I've only been involved since the pycon > sprints (been using Python exclusively for 5 years though). So, I am > still feeling out the mechanics of how people cooperate on this sort > of stuff. > > Secondly, regardless of importlib or import.c or whatever, the sys > module will need to have "virtual_packages" added right? I stuck that > code in import.c next to where sys.meta_path and others get > initialized [1]. Is that the right place to do it? Should it go in > sysmodule.c instead? > I can understand populating those properties in import.c, but it is probably better to initialize the empty data structures in sysmodule.c so that the code to get the module in a basic state is centralized. but if sys.meta_path and friends are elsewhere then you can start there and have a separate patch (file a bug now, though) to possibly relocate the code later. -Brett > > Thanks, > > -eric > > [1] http://hg.python.org/cpython/file/default/Python/import.c#l204 > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Jul 20 23:15:55 2011 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 20 Jul 2011 17:15:55 -0400 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: Message-ID: <20110720211636.8BC553A409B@sparrow.telecommunity.com> At 01:08 PM 7/20/2011 -0700, Brett Cannon wrote: >Obviously feel free to ask me questions (publicly or privately) if >anything in the importlib code is an issue for you (I know its >structure for bootstrapping reasons is a bit odd). While we're on the topic, I was just browsing through importlib (while doing my sketch on how to support the "no pure virtual imports" change to PEP 402; see http://mail.python.org/pipermail/python-dev/2011-July/112385.html ) and I noticed that there are a few places in the implementation where it makes assumptions about objects' boolean values. For example, PathFinder's find_module treats an empty path the same as sys.path, and will also fail if for some reason the bool() of a PEP 302 finder or loader object is False. Also, module_for_loader() will create a new module object, if you have a False module subclass in sys.modules. Is there any particular reason for these digressions from strict PEP 302? I can understand, say, Jython and IronPython not wanting to generate object id's, but I was under the impression that those languages can do identity checks (especially against None) without running into the general problem of generating object IDs in the presence of garbage collection. These distinctions could be more problematic than they appear, as it's possible to inadvertently make your loader or your module subclass capable of being False (for example, if you subclassed a sequence type or implemented a __len__), and this could lead to some very subtle bugs, albeit very rare ones as well. ;-) From brett at python.org Wed Jul 20 23:55:38 2011 From: brett at python.org (Brett Cannon) Date: Wed, 20 Jul 2011 14:55:38 -0700 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: <20110720211636.8BC553A409B@sparrow.telecommunity.com> References: <20110720211636.8BC553A409B@sparrow.telecommunity.com> Message-ID: No specific reason. Feel free to file a bug and assign it to me. On Wed, Jul 20, 2011 at 14:15, P.J. Eby wrote: > At 01:08 PM 7/20/2011 -0700, Brett Cannon wrote: > >> Obviously feel free to ask me questions (publicly or privately) if >> anything in the importlib code is an issue for you (I know its structure for >> bootstrapping reasons is a bit odd). >> > > While we're on the topic, I was just browsing through importlib (while > doing my sketch on how to support the "no pure virtual imports" change to > PEP 402; see http://mail.python.org/**pipermail/python-dev/2011-** > July/112385.html) and I noticed that there are a few places in the implementation where it > makes assumptions about objects' boolean values. > > For example, PathFinder's find_module treats an empty path the same as > sys.path, and will also fail if for some reason the bool() of a PEP 302 > finder or loader object is False. Also, module_for_loader() will create a > new module object, if you have a False module subclass in sys.modules. > > Is there any particular reason for these digressions from strict PEP 302? > I can understand, say, Jython and IronPython not wanting to generate object > id's, but I was under the impression that those languages can do identity > checks (especially against None) without running into the general problem of > generating object IDs in the presence of garbage collection. > > These distinctions could be more problematic than they appear, as it's > possible to inadvertently make your loader or your module subclass capable > of being False (for example, if you subclassed a sequence type or > implemented a __len__), and this could lead to some very subtle bugs, albeit > very rare ones as well. ;-) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jul 21 01:18:06 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 Jul 2011 09:18:06 +1000 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: <20110720211636.8BC553A409B@sparrow.telecommunity.com> Message-ID: On Thu, Jul 21, 2011 at 7:55 AM, Brett Cannon wrote: > No specific reason. Feel free to file a bug and assign it to me. Yeah, it sounds like a few "is not None" snippets need to be sprinkled around and some pathological cases added to the import test suite. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Thu Jul 21 01:26:18 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 Jul 2011 17:26:18 -0600 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 2:08 PM, Brett Cannon wrote: > Obviously feel free to ask me questions (publicly or privately) if anything > in the importlib code is an issue for you (I know its structure for > bootstrapping reasons is a bit odd). Thanks. To be honest, with the time I have spent in importlib in the last couple months I realize how much work you put into it, so thanks. It makes it really easy to hack the import mechanism. > I really doubt anyone has jumped into this as much as you have, Eric. =) You > can also always do it on bitbucket or somewhere so that others can > collaborate. I believe there is even a cpython mirror there so that should > make it easy to fork and pull in updates. Yep, already have my bitbucket clone (haven't pushed committed to it yet though). >> Secondly, regardless of importlib or import.c or whatever, the sys >> module will need to have "virtual_packages" added right? ?I stuck that >> code in import.c next to where sys.meta_path and others get >> initialized [1]. ?Is that the right place to do it? ?Should it go in >> sysmodule.c instead? > > I can understand populating those properties in import.c, but it is probably > better to initialize the empty data structures in sysmodule.c so that the > code to get the module in a basic state is centralized. but if sys.meta_path > and friends are elsewhere then you can start there and have a separate patch > (file a bug now, though) to possibly relocate the code later. Good idea. I submitted issue 12598 along with a patch. -eric From ncoghlan at gmail.com Thu Jul 21 01:27:13 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 Jul 2011 09:27:13 +1000 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 6:08 AM, Brett Cannon wrote: > I really doubt anyone has jumped into this as much as you have, Eric. =) You > can also always do it on bitbucket or somewhere so that others can > collaborate. I believe there is even a cpython mirror there so that should > make it easy to fork and pull in updates. +1 for publishing on bitbucket. I recently moved my own sandox from python.org to bitbucket in order to make collaboration easier and I know Eric already has an account there (cf. the importlib 2.x backport scripts). The cpython mirror is at: https://bitbucket.org/mirror/cpython/overview Also, take note of the refinement PJE described on python-dev: sys.virtual_packages will be a dict mapping to __path__ contents and directly importing pure virtual packages will only be permitted if a child package has already been successfully imported. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Jul 21 01:35:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 Jul 2011 09:35:18 +1000 Subject: [Import-SIG] PEP 402 implementation In-Reply-To: References: <20110720211636.8BC553A409B@sparrow.telecommunity.com> Message-ID: On Thu, Jul 21, 2011 at 9:18 AM, Nick Coghlan wrote: > On Thu, Jul 21, 2011 at 7:55 AM, Brett Cannon wrote: >> No specific reason. Feel free to file a bug and assign it to me. > > Yeah, it sounds like a few "is not None" snippets need to be sprinkled > around and some pathological cases added to the import test suite. Created as http://bugs.python.org/issue12599 -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Thu Jul 21 22:42:37 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 21 Jul 2011 22:42:37 +0200 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> Message-ID: <4E288F3D.4060602@v.loewis.de> Am 18.07.2011 16:50, schrieb P.J. Eby: > What do y'all think? Should we submit the PEP, and run it by > Python-Dev? Anybody have any changes, questions, etc.? I still plan to write my own version of it, so that would make it three PEPs. Regards, Martin From brett at python.org Thu Jul 21 22:59:41 2011 From: brett at python.org (Brett Cannon) Date: Thu, 21 Jul 2011 13:59:41 -0700 Subject: [Import-SIG] So... should we do this thing? In-Reply-To: <4E288F3D.4060602@v.loewis.de> References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com> <4E288F3D.4060602@v.loewis.de> Message-ID: On Thu, Jul 21, 2011 at 13:42, "Martin v. L?wis" wrote: > Am 18.07.2011 16:50, schrieb P.J. Eby: > > What do y'all think? Should we submit the PEP, and run it by > > Python-Dev? Anybody have any changes, questions, etc.? > > I still plan to write my own version of it, so that would make it > three PEPs. > A trifecta! At least we have options to choose from. It's a tricky enough topic to get right that I'm not surprised at the possibility of three PEPs on it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jul 21 22:37:00 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 21 Jul 2011 16:37:00 -0400 Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning" In-Reply-To: References: <20110713171345.4E0673A4100@sparrow.telecommunity.com> <20110718121726.123e5b44@resist.wooz.org> Message-ID: <20110721163700.3daff988@resist.wooz.org> On Jul 19, 2011, at 08:07 AM, Nick Coghlan wrote: >pkgutil.get_data() needs to be updated to handle this case, so >retrieving the contents of a specific file in the directory above >could be written as either of the following: > >pkgutil.get_data(foo, 'test/data/file.dat') >pkgutil.get_data(foo.test, 'data/file.dat') The latter looks fine to me. >The question of PEP 302 and listing *available* data files (and other >directory-style or lazy data access I/O operations) remains open >(independent of the changes in this PEP). Note that os.path.join based >approaches already break as soon as you put the package and data files >in a zipfile. Yep. >In reality, I believe people should be using the appropriate packaging >APIs so that source files and data files may be deployed to distinct >locations. Completely agree. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From jergosh at gmail.com Sun Jul 31 14:05:15 2011 From: jergosh at gmail.com (Greg Slodkowicz) Date: Sun, 31 Jul 2011 14:05:15 +0200 Subject: [Import-SIG] New PEP Draft: Import Engine Message-ID: Dear all, The following is a result of a GSoC project I've been working on with Nick and Brett. I wrote up a description of the proposed changes as a short PEP draft. I'd appreciate any suggestions or criticism. A sligthly more readable version is also available at http://wiki.python.org/moin/SummerOfCode/PythonImportEnginePlanning?action=edit&editor=text PEP: XXX Title: Python Import Engine Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan , Greg Slodkowicz Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-Jul-2011 Post-History: XXX Abstract ======== This PEP proposes incorporating an 'import engine' class which would encapsulate all state related to importing modules into a single object and provide an alternative to the built-in implementation of the import statement, which is syntactic sugar for the ``__import__()`` method. Currently the bulk of importing work is done by means of module finders and loaders, and their interfaces would require a simple change in order to work both the builtin import functionality and importing via import engine objects. In that sense, this PEP constitutes a revision of finder and loader interfaces described in PEP 302 [1]_. Rationale ========= Historically, any modification to the import functionality required re-implementing ``__import__()`` entirely. PEP 302 provides a major improvement by introducing separation between imports of different types of modules. As a result, additional process-global state is stored in the sys module. This, along with earlier import-related global state, comprises: * sys.modules * sys.path * sys.path_hooks * sys.meta_path * sys.path_importer_cache * the import lock (imp.lock_held()/acquire_lock()/release_lock()) Isolating this state would allow multiple import states to be conveniently stored within a process. Placing the import functionality in a self-contained object would allow subclassing to add additional features (e.g. module import notifications or fine-grained control over which modules can be imported). The engine would also be subclassed to make it possible to use the import engine API to interact with the existing process-global state. Proposal ======== We propose introducing an ImportEngine class to encapsulate import functionality. This includes the ``__import__()`` function which can be used to as an alternative to the built-in ``__import__()`` when desired and also ``import_module()``, equivalent to ``importlib.import_module()`` [3]_. Since the new style finders and loaders should also have the option to modify the global import state, we introduce a ``GlobalImportState`` class with an interface identical to ``ImportEngine`` but taking advantage of the global state. This can be easily implemented using class properties. Design and Implementation ========================= API ~~~~ The proposed extension would consist of the following objects: ``engine.ImportEngine`` ``__import__(self, name, globals={}, locals={}, fromlist=[], level=0)`` Reimplementation of the builtin ``__import__()`` function. The import of a module will proceed using the state stored in the ImportEngine instance rather than the global import state. For full documentation of ``__import__`` funtionality, see [2]_ . ``__import__()`` from ``ImportEngine`` and its subclasses can be used to customise the behaviour of the ``import`` statement by replacing ``__builtin__.__import__`` with ``ImportEngine.__import__``. ``import_module(name, package=None)`` A reimplementation of ``importlib.import_module()`` which uses the import state stored in the ImportEngine instance. See [3]_ for a full reference. ``from_engine(self, other)`` Create a new import object from another ImportEngine instance. The new object is initialised with a copy of the state in ``other``. When called on ``engine.sysengine`` as ``other``, ``from_engine()`` can be used to create an ImportEngine object with a **copy** of the global import state. ``GlobalImportEngine(ImportEngine)`` Convenience class to provide engine-like access to the global state. Provides ``__import__()``, ``import_module()`` and ``from_engine()`` methods like ``ImportEngine`` but writes through to the global state in ``sys``. Global variables ~~~~~~~~~~~~~~~~ ``engine.sysengine`` Instance of GlobalImportEngine provided for convenience (e. g. for use by module finders and loaders). Necessary changes to finder/loader interfaces: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``find_module`` (cls, fullname, path=None, **engine=None**) ``load module`` (cls, fullname, path=None, **engine=None**) The only difference between 'new style' and PEP 302 compatible finders/loaders is the presence of an additional ``engine`` parameter. This is intended to specify an ImportEngine instance or subclass there of. This parameter is optional so that the 'new style' finders and loaders can be made backwards compatible by falling back on engine.sysengine with the following simple pattern: :: find_module(cls, fullname, path=None, engine=None) if not engine: engine = engine.sysengine ... An implementation based on Brett Cannon's importlib has been developed by Greg Slodkowicz as part of the 2011 Google Summer of Code. The code repository is located at https://bitbucket.org/jergosh/gsoc_import_engine/. Open Issues ~~~~~~~~~~~ The existing importlib implementation depends on several functions from ``imp``, Python's builtin implementation of ``__import__`` located in *Python/import.c*. These functions are unaware of ImportEngine and place the newly imported module in ``sys.modules``. Naturally, this is a problem from the ImportEngine point of view. The offending methods are: * imp.init_builtin() * imp.load_dynamic() However, since there can be only a single instance of each builtin/dynamic module per process, they are essentially process-global regardless of the way they are imported. Currently, the simplest solution for supporting them in ImportEngine seems to have new style loaders call the existing imp methods and then copy appropriate references from ``sys.modules`` into the state inside the import engine. Similarly, ``imp.NullImporter`` implements a ``load_module`` method which is incompatible with 'new style' loaders. Since the ``NullImporter`` class does next to nothing (i. e. always returns None), it has been reimplemented in Python. The only way this could cause problems would be explicitly checking if a module's importer is an imp.NullImporter (which occurs only in some unittests). References ========== .. [1] PEP 302, New Import Hooks, J van Rossum, Moore (http://www.python.org/dev/peps/pep-0302) .. [2] __import__() builtin function, The Python Standard Library documentation (http://docs.python.org/library/functions.html#__import__) .. [3] Importlib documentation, Cannon (http://docs.python.org/dev/library/importlib) Copyright ========= This document has been placed in the public domain. Best regards, Greg From pje at telecommunity.com Sun Jul 31 16:51:30 2011 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 31 Jul 2011 10:51:30 -0400 Subject: [Import-SIG] New PEP Draft: Import Engine In-Reply-To: References: Message-ID: <20110731145241.AEC433A409B@sparrow.telecommunity.com> At 02:05 PM 7/31/2011 +0200, Greg Slodkowicz wrote: >Necessary changes to finder/loader interfaces: >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >``find_module`` (cls, fullname, path=None, **engine=None**) > >``load module`` (cls, fullname, path=None, **engine=None**) > >The only difference between 'new style' and PEP 302 compatible >finders/loaders is the presence of an additional ``engine`` parameter. >This is intended to specify an ImportEngine instance or subclass there >of. This parameter is optional so that the 'new style' finders and >loaders can be made backwards compatible by falling back on >engine.sysengine with the following simple pattern: I see how you can make new style loaders callable from the old system, but how do you make *old* loaders usable from the *new* system? That is, I don't see how this proposal is backwards compatible with PEP 302. For that, I think you'd have to define new, optional method names for the methods that accepted an engine parameter, with the engine falling back to calling the PEP 302 names if the new ones weren't available. >The existing importlib implementation depends on several functions >from ``imp``, Python's builtin implementation of ``__import__`` >located in *Python/import.c*. These functions are unaware of >ImportEngine and place the newly imported module in ``sys.modules``. >Naturally, this is a problem from the ImportEngine point of view. It's a general backwards compatibility problem, since importers in general are able to assume (and often do) that the loaded modules will be placed in sys.modules. >Similarly, ``imp.NullImporter`` implements a ``load_module`` method >which is incompatible with 'new style' loaders. Again, if you use PEP 302 methods only as compatibility fallbacks, this won't be an issue. The biggest problem I see with this as a PEP is that there isn't any discussion of backwards compatibility, in the sense that the PEP is all about how things *aren't* going to be backwards compatible, and the Rationale doesn't present any specific use cases that would justify the created incompatibilities. It would be much better if you can reframe your proposal in terms of *additions* to the PEP 302 protocol, rather than *changes*.