[Python-ideas] My objections to implicit package directories

Eric Snow ericsnowcurrently at gmail.com
Tue Mar 13 03:43:26 CET 2012


On Mon, Mar 12, 2012 at 5:03 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> It seems the consensus at the PyCon US sprints is that implicit
> package directories are a wonderful idea and we should have more of
> those. I still disagree (emphatically), but am prepared to go along
> with it so long as my documented objections are clearly and explicitly
> addressed in the new combined PEP, and the benefits ascribed to
> implicit package directories in the new PEP are more compelling than
> "other languages do it that way, so we should too".
>
> To save people having to trawl around various mailing list threads and
> reading through PEP 395, I'm providing those objections in a
> consolidated form here. If reading these objections in one place
> causes people to have second thoughts about the wisdom of implicit
> package directories, even better.
>
> 1. Implicit package directories go against the Zen of Python
>
> Getting this one out of the way first. As I see it, implicit package
> directories violate at least 4 of the design principles in the Zen:
> - Explicit is better than implicit (my calling them implicit package
> directories is a deliberate rhetorical ploy to harp on this point,
> although it's also an accurate name)
> - If the implementation is hard to explain, it's a bad idea (see the
> section about backwards compatibility challenges)
> - Readability counts (see the section about introducing ambiguity into
> filesystem layouts)
> - Errors should never pass silently (see the section about implicit
> relative imports from main)
>
> 2. Implicit package directories pose awkward backwards compatibility challenges
>
> It concerns me gravely that the consensus proposal MvL posted is
> *backwards incompatible with Python 3.2*, as it deliberately omits one
> of the PEP 402 features that provided that backwards compatibility.
> Specifically, under the consensus, a subdirectory "foo" of a directory
> on sys.path will shadow a "foo.py" or "foo/__init__.py" that appears
> later on sys.path. As Python 3.2 would have found that latter
> module/package correctly, this is an unacceptable breach of the
> backwards compatibility requirements. PEP 402 at least got this right
> by always executing the first "foo.py" or "foo/__init__.py" it found,
> even if
> another "foo" directory was found earlier in sys.path.
>
> We can't just wave that additional complexity away if an implicit
> package directory proposal is going to remain backwards compatible
> with current layouts (e.g. if an application's starting directory
> included a "json" subfolder containing json files rather than Python
> code, the consensus approach as posted by MvL would render the
> standard library's json module inaccessible)
>
> 3. Implicit package directories introduce ambiguity into filesystem layouts
>
> With the current Python package design, there is a clear 1:1 mapping
> between the filesystem layout and the module hierarchy. For example:
>
>    parent/  # This directory goes on sys.path
>        project/  # The "project" package
>            __init__.py  # Explicit package marker
>            code.py  # The "project.code" module
>            tests/  # The "project.tests" package
>                __init__.py  # Explicit package marker
>                test_code.py  # The "projects.tests.test_code" module
>
> Any explicit package directory approach will preserve this 1:1
> mapping. For example, under PEP 382:
>
>    parent/  # This directory goes on sys.path
>        project.pyp/  # The "project" package
>            code.py  # The "project.code" module
>            tests.pyp/  # The "project.tests" package
>                test_code.py  # The "projects.tests.test_code" module
>
> With implicit package directories, you can no longer tell purely from
> the code structure which directory is meant to be added to sys.path,
> as there are at least two valid mappings to the Python module
> hierarchy:
>
>    parent/  # This directory goes on sys.path
>        project/  # The "project" package
>            code.py  # The "project.code" module
>            tests/  # The "project.tests" package
>                test_code.py  # The "projects.tests.test_code" module
>
>    parent/
>        project/  # This directory goes on sys.path
>            code.py  # The "code" module
>            tests/  # The "tests" package
>                test_code.py  # The "tests.test_code" module
>
> What are implicit package directories buying us in exchange for this
> inevitable ambiguity? What can we do with them that can't be done with
> explicit package directories? And no, "Java does it that way" is not a
> valid argument.
>
> 4. Implicit package directories will permanently entrench current
> newbie-hostile behaviour in __main__
>
> It's a fact of life that Python beginners learn that they can do a
> quick sanity check on modules they're writing by including an "if
> __name__ == '__main__':" section at the end and doing one of 3 things:
> - run "python mymodule.py"
> - hit F5 (or the relevant hot key) in their IDE
> - double click the module in their filesystem browser
> - start the Python REPL and do "import mymodule"
>
> However, there are some serious caveats to that as soon as you move
> the module inside a package:
> - if you use explicit relative imports, you can import it, but not run
> it directly using any of the above methods
> - if you rely on implicit relative imports, the above direct execution
> methods should work most of the time, but you won't be able to import
> it
> - if you use absolute imports for your own package, nothing will work
> (unless the parent directory for your package is already on sys.path)
> - if you only use absolute imports for *other* packages, everything
> should be fine
>
> The errors you get in these cases are *horrible*. The interpreter
> doesn't really know what is going on, so it gives the user bad error
> messages.
>
> In large part, the "Why are my imports broken?" section in PEP 395
> exists because I sat down to try to document what does and doesn't
> work when you attempt to directly execute a module from inside a
> package directory. In building the list of what would work properly
> ("python -m" from the parent directory of the package) and what would
> sometimes break (everything else), I realised that instead of
> documenting the entire hairy mess, the 1:1 mapping from the filesystem
> layout to the Python module hierarchy meant we could *just fix it* to
> not do the wrong thing by default. If implicit package directories are
> blessed for inclusion in Python 3.3, that opportunity is lost forever
> - with the loss of the unambiguous 1:1 mapping from the filesystem
> layout to the module hierarchy, it's no longer possible for the
> interpreter to figure out the right thing to do without guessing.
>
> PJE proposed that newbies be instructed to add the following
> boilerplate to their modules if they want to use "if __name__ ==
> '__main__':" for sanity checking:
>
>    import pkgutil
>    pkgutil.script_module(__name__, 'project.code.test_code')
>
> This completely defeats the purpose of having explicit relative
> imports in the language, as it embeds the absolute name of the module
> inside the module itself. If a package subtree is ever moved or
> renamed, you will have to manually fix every script_module()
> invocation in that subtree. Double-keying data like this is just plain
> bad design. The package structure should be recorded explicitly in
> exactly one place: the filesystem.
>
> PJE has other objections to the PEP 395 proposal, specifically
> relating to its behaviour on package layouts where the directories
> added to sys.path contain __init__.py files, such that the developer's
> intent is not accurately reflected in their filesystem layout. Such
> layouts are *broken*, and the misbehaviour under PEP 395 won't be any
> worse than the misbehaviour with the status quo (sys.path[0] is set
> incorrectly in either case, it will just be fixable under PEP 395 by
> removing the extraneous __init__.py files). A similar argument applies
> to cases where a parent package __init__ plays games with sys.path
> (although the PEP 395 algorithm could likely be refined to better
> handle that situation). Regardless, if implicit package directories
> are accepted into Python 3.3 in any form, I *will* be immediately
> marking PEP 395 as Rejected due to incompatibility with an accepted
> PEP. I'll then (eventually, once I'm less annoyed about the need to do
> so) write a new PEP to address a subset of the issues previously
> covered by PEP 395 that omits any proposals that rely on explicit
> package directories.
>
> Also, I consider it a requirement that any implicit packages PEP
> include an update to the tutorial to explain to beginners what will
> and won't work when they attempt to directly execute a module from
> inside a Python package. After all, such a PEP is closing off any
> possibility of ever fixing the problem: it should have to deal with
> the consequences.

Hi Nick,

The write-up was a little unclear on a main point and I think that's
contributed to some confusion here.  The path search will continue to
work in exactly the same way as it does now, with one difference.
Instead of the current ImportError when nothing matches, the mechanism
for namespace packages would be used.

The mechanism would create a namespace package with a __path__
matching the paths corresponding to all namespace package "portions".
The likely implementation will simply track the namespace package
__path__ during the initial (normal) path search and use it only when
there are no matching modules nor regular packages.

Packages without __init__.py would only be allowed for namespace
packages.  So effectively namespace packages would be problematic for
PEP 395, but not normal packages.

Ultimately this is a form of PEP 402 without so much complexity.  The
trade-off is it requires a new kind of package.  As far as I
understand them, most of your concerns are based on the idea that
namespace packages would be included in the initial traversal of
sys.path, which is not the case.  It sounds like there are a couple
points you made that may still need attention, but hopefully this at
least helps clarify what we talked about.

-eric



More information about the Python-ideas mailing list