[Import-SIG] My objections to implicit package directories

Guido van Rossum guido at python.org
Tue Mar 13 04:49:12 CET 2012


On Mon, Mar 12, 2012 at 5:21 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> It seems the consensus at the PyCon US sprints is that implicit
> package directories are a wonderful idea and we should have more of
> those. I still disagree (emphatically), but am prepared to go along
> with it so long as my documented objections are clearly and explicitly
> addressed in the new combined PEP, and the benefits ascribed to
> implicit package directories in the new PEP are more compelling than
> "other languages do it that way, so we should too".
>
> To save people having to trawl around various mailing list threads and
> read through PEP 395, I'm providing those objections in a
> consolidated form here.

(Thanks for that.)

> If reading these objections in one place
> causes people to have second thoughts about the wisdom of implicit
> package directories, even better.
>
> 1. Implicit package directories go against the Zen of Python
>
> Getting this one out of the way first. As I see it, implicit package
> directories violate at least 4 of the design principles in the Zen:
> - Explicit is better than implicit (my calling them implicit package
> directories is a deliberate rhetorical ploy to harp on this point,
> although it's also an accurate name)
> - If the implementation is hard to explain, it's a bad idea (see the
> section about backwards compatibility challenges)
> - Readability counts (see the section about introducing ambiguity into
> filesystem layouts)
> - Errors should never pass silently (see the section about implicit
> relative imports from main)

Whatever. There's "practicality beats purity" though, and unmarked
directories are quite intuitive and logical to newcomers. In fact, the
original package implementation (ni.py, checked in originally with rev
2887:ec0b42889243), __init__.py was optional. At the time we hadn't
thought of the use case of "namespace packages" like zope.interfaces
and zope.components, where there are multiple distributable "bundles"
that install different *portions* of the package. During the meeting
it also came up that there are two styles in use for this purpose:
multiple distro bundles that install into the *same* directory, or
multiple distro bundles installing into different directories (whose
parents are all added to sys.path separately). We also came up with
the encodings package as a potential namespace package, although it
currently doesn't have an empty __init__.py.

But hold that thought, there's more that I'll address later.

> 2. Implicit package directories pose awkward backwards compatibility challenges
>
> It concerns me gravely that the consensus proposal MvL posted is
> *backwards incompatible with Python 3.2*, as it deliberately omits one
> of the PEP 402 features that provided that backwards compatibility.
> Specifically, under the consensus, a subdirectory "foo" of a directory
> on sys.path will shadow a "foo.py" or "foo/__init__.py" that appears
> later on sys.path. As Python 3.2 would have found that latter
> module/package correctly, this is an unacceptable breach of the
> backwards compatibility requirements. PEP 402 at least got this right
> by always executing the first "foo.py" or "foo/__init__.py" it found,
> even if
> another "foo" directory was found earlier in sys.path.
>
> We can't just wave that additional complexity away if an implicit
> package directory proposal is going to remain backwards compatible
> with current layouts (e.g. if an application's starting directory
> included a "json" subfolder containing json files rather than Python
> code, the consensus approach as posted by MvL would render the
> standard library's json module inaccessible)

We must have explained this badly, because (just like PEP 402, AFAIK)
this is *not* how it works. It works as follows:

*If* there is a foo.py or a foo/__init__.py anywhere along sys.path,
the *current* rules apply. That is, of one of these occurs on an
earlier sys.path entry, it wins; if both of these occur together on
the same sys.path entry, foo/__init__.py wins. (We discovered that the
latter disambiguation must prefer the directory, not just for
backwards compatibility, but also to make relative imports in
subpackages work right. This is probably the biggest deviation from
PEP 402.) And in this case all those foo/ directories *without* a
__init__.py in them are completely ignored, even if they come before
either foo.py or foo/__init__.py on sys.path. (If __init__.py wants to
manipulate its own __path__, that's fine.)

*Only* if *neither* foo.py *nor* foo/__init__.py is found *anywhere*
along sys.path do we take all directories foo/ along sys.path together
and combine them into a namespace package. If there are no foo/
directories at all, the import fails. If there is exactly one foo/, it
acts like a classic package with an empty __init__.py. We avoid having
to do two scans of sys.path by collecting info about __init__.py-less
foo/ directories during the same scan where we look for foo.py and
foo/__init__.py; but we collect it in a separate variable. (It occurs
to me that this may not be trivial when PEP-302-style finders are
involved. That's a detail that will to be figured out later.)

So the only backwards incompatibility is that "import foo" may succeed
where it previously failed if there is a directory foo/ somewhere on
sys.path but no foo.py and no foo/__init__.py anywhere. I don't think
this is a big deal.

(Note: where I write foo.py, I should really write
foo.py/foo.pyc/foo.pyo/foo.so/foo.pyd. But that's such a mouthful...)

> 3. Implicit package directories introduce ambiguity into filesystem layouts
>
> With the current Python package design, there is a clear 1:1 mapping
> between the filesystem layout and the module hierarchy. For example:
>
>   parent/  # This directory goes on sys.path
>       project/  # The "project" package
>           __init__.py  # Explicit package marker
>           code.py  # The "project.code" module
>           tests/  # The "project.tests" package
>               __init__.py  # Explicit package marker
>               test_code.py  # The "projects.tests.test_code" module
>
> Any explicit package directory approach will preserve this 1:1
> mapping. For example, under PEP 382:
>
>   parent/  # This directory goes on sys.path
>       project.pyp/  # The "project" package
>           code.py  # The "project.code" module
>           tests.pyp/  # The "project.tests" package
>               test_code.py  # The "projects.tests.test_code" module
>
> With implicit package directories, you can no longer tell purely from
> the code structure which directory is meant to be added to sys.path,
> as there are at least two valid mappings to the Python module
> hierarchy:
>
>   parent/  # This directory goes on sys.path
>       project/  # The "project" package
>           code.py  # The "project.code" module
>           tests/  # The "project.tests" package
>               test_code.py  # The "projects.tests.test_code" module
>
>   parent/
>       project/  # This directory goes on sys.path
>           code.py  # The "code" module
>           tests/  # The "tests" package
>               test_code.py  # The "tests.test_code" module

I know this bothers you greatly, because you wrote at great length
about it in PEP 395. But personally I think that being able to guess
the highest package directory given the name of a .py file nested deep
inside it is a pretty esoteric use case and I can live with this
continuing to be broken (since it is already broken) for the sake of a
simpler package structure (no __init__.py files!).

> What are implicit package directories buying us in exchange for this
> inevitable ambiguity? What can we do with them that can't be done with
> explicit package directories? And no, "Java does it that way" is not a
> valid argument.

Apart from the pitchfork incident referenced in PEP 402, I have had
many other complaints about the ubiquitous empty __init__.py files.
They may be empty, but they sure take up space in e.g. directory
listings or zipfiles. For example, there are 409 empty __init__.py
files in the Django 1.4c1 distro, plus 25 more that contain either an
empty comment or a blank line.

I've also seen __init__.py files with a single rude comment in them,
and in my G+ stream I've seen comments on random Python topics making
a snide reference to empty __init__.py files. (There are also coding
guidelines in some places that prohibit having real code in
__init__.py files.)

Quite separately, it also gives us an easy way to have namespace
packages spread across multiple directories. This is clearly a popular
feature, given that there are at least *two* different convenience
APIs to make this easy (one in pkgutil.py, another in setuptools). I
did a quick search for "import pkgutil" on koders.com and the first 25
hits (of 792) are all declaring namespace packages, many using an
awkward idiom using a try/except to import either pkg_resources or
pkgutil. This awkwardness really bugs me and being able to eventually
drop it is a big draw for me.

> 4. Implicit package directories will permanently entrench current
> newbie-hostile behaviour in __main__
>
> It's a fact of life that Python beginners learn that they can do a
> quick sanity check on modules they're writing by including an "if
> __name__ == '__main__':" section at the end and doing one of 3 things:
> - run "python mymodule.py"
> - hit F5 (or the relevant hot key) in their IDE
> - double click the module in their filesystem browser
> - start the Python REPL and do "import mymodule"

[...] Our *four*...no... *Amongst* our weapons.... Amongst our
weaponry...are such elements as fear, surprise.... I'll come in again.

> However, there are some serious caveats to that as soon as you move
> the module inside a package:
> - if you use explicit relative imports, you can import it, but not run
> it directly using any of the above methods
> - if you rely on implicit relative imports, the above direct execution
> methods should work most of the time, but you won't be able to import
> it
> - if you use absolute imports for your own package, nothing will work
> (unless the parent directory for your package is already on sys.path)
> - if you only use absolute imports for *other* packages, everything
> should be fine
>
> The errors you get in these cases are *horrible*. The interpreter
> doesn't really know what is going on, so it gives the user bad error
> messages.
>
> In large part, the "Why are my imports broken?" section in PEP 395
> exists because I sat down to try to document what does and doesn't
> work when you attempt to directly execute a module from inside a
> package directory. In building the list of what would work properly
> ("python -m" from the parent directory of the package) and what would
> sometimes break (everything else), I realised that instead of
> documenting the entire hairy mess, the 1:1 mapping from the filesystem
> layout to the Python module hierarchy meant we could *just fix it* to
> not do the wrong thing by default. If implicit package directories are
> blessed for inclusion in Python 3.3, that opportunity is lost forever
> - with the loss of the unambiguous 1:1 mapping from the filesystem
> layout to the module hierarchy, it's no longer possible for the
> interpreter to figure out the right thing to do without guessing.

I understand your frustration at just having analyzed this mess and
come up with a solution, only to see it permanently sabotaged before
you could even implement it. But it's an existing mess, and if I
really have to choose between solving this mess or solving the
empty-init mess, I vote for solving the latter.

But I would hope that the most common cases are still that the package
in fact already exists on sys.path, possibly because it is rooted in
the current directory, or because the package has been properly
installed. In this case you should have no problem computing the
toplevel package implied.

The other common case is where the current directory is *inside* the
package. I agree this is a bad mess. But does this happen with a
typical IDE? It seems more common when using the shell. Anyway, maybe
we just have to document more aggressively that this is a bad idea and
explain to people how to avoid it. (One of the ways to avoid it would
be "add an empty __init__.py to your package directories", since that
will in fact still avoid it.)

There's also a nasty habit that Django has around packages and parent
directories. The Django developers announced at PyCon that they're
breaking this habit in Django 1.4. (And they also announced that
Django 1.5 will be compatible with Python 3.3!)

> PJE proposed that newbies be instructed to add the following
> boilerplate to their modules if they want to use "if __name__ ==
> '__main__':" for sanity checking:
>
>   import pkgutil
>   pkgutil.script_module(__name__, 'project.code.test_code')
>
> This completely defeats the purpose of having explicit relative
> imports in the language, as it embeds the absolute name of the module
> inside the module itself. If a package subtree is ever moved or
> renamed, you will have to manually fix every script_module()
> invocation in that subtree. Double-keying data like this is just plain
> bad design. The package structure should be recorded explicitly in
> exactly one place: the filesystem.

I agree that telling newbies to do *anything* with pkgutil is backwards.

> PJE has other objections to the PEP 395 proposal, specifically
> relating to its behaviour on package layouts where the directories
> added to sys.path contain __init__.py files, such that the developer's
> intent is not accurately reflected in their filesystem layout. Such
> layouts are *broken*, and the misbehaviour under PEP 395 won't be any
> worse than the misbehaviour with the status quo (sys.path[0] is set
> incorrectly in either case, it will just be fixable under PEP 395 by
> removing the extraneous __init__.py files). A similar argument applies
> to cases where a parent package __init__ plays games with sys.path
> (although the PEP 395 algorithm could likely be refined to better
> handle that situation). Regardless, if implicit package directories
> are accepted into Python 3.3 in any form, I *will* be immediately
> marking PEP 395 as Rejected due to incompatibility with an accepted
> PEP. I'll then (eventually, once I'm less annoyed about the need to do
> so) write a new PEP to address a subset of the issues previously
> covered by PEP 395 that omits any proposals that rely on explicit
> package directories.

Please reconsider -- there was at least one important detail in the
proposal that you misunderstood.

> Also, I consider it a requirement that any implicit packages PEP
> include an update to the tutorial to explain to beginners what will
> and won't work when they attempt to directly execute a module from
> inside a Python package.

That's fine.

> After all, such a PEP is closing off any
> possibility of ever fixing the problem: it should have to deal with
> the consequences.

Not so gloomy, Nick! There are still quite a few cases that can be
detected properly. I think the rule "don't cd into a package" covers
most cases.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Import-SIG mailing list